The term “artificial intelligence” (AI) has been inescapable since OpenAI released ChatGPT to the public in November 2022. Although soft forms of AI have been in use for decades prior—whether on the backend of widely used tools such as Google Translate, underlying electrocardiogram interpretations, or in experimental settings—the introduction of AI chatbots like ChatGPT appears to have been an inflection point. There has also been plenty of public spectacle: Microsoft’s chatbot, Sydney, which briefly captivated the world with a personality of its own;1 OpenAI CEO Sam Altman’s unexpected ouster, followed by his rapid reinstatement;2 the meteoric rise of AI-adjacent stocks fueling the financial markets;3 and entertaining, if eerie, demonstrations of new augmented reality products by OpenAI,4 Meta,5 and Apple.6
In the field of dermatology, dozens of specific AI use cases have already been demonstrated in experimental settings. A partial list includes the classification of pigmented skin lesions,7-9 systemic lupus erythematosus,10 infantile hemangiomas,11 diabetic foot ulcers,12 onychomycosis;13 the use of histopathologic images to diagnose melanoma and other melanocytic neoplasms,14-19 basal cell carcinoma (BCC),20,21 basal cell carcinoma in Mohs frozen sections,22-24 squamous cell carcinoma (SCC) and seborrheic keratosis,25 onychomycosis in nail clippings,26,27 predictive SOX-10 staining solely on the basis of hematoxylin- and eosin-stained histopathologic images,28 triage of Mohs cases,29-31 prediction of their complexity from pre-operative tumor characteristics,30 and identification of the preferred reconstruction technique from defect photos.32,33
Still, the majority of these studies were performed in highly simulated environments that may not be applicable to everyday usage, and these algorithms have largely remained inaccessible to the average practicing dermatologist or their staff. The introduction of large language models (LLMs, which power AI chatbots) like ChatGPT and Gemini have allowed laypeople to regularly interact with AI in ways they couldn’t previously. Rather than face the task of sourcing the computer power required for algorithm development and going through the many necessary steps to train an algorithm, a user can now simply chat with a freely available online tool and get an instant response that generally seems “reasonable.”
Are dermatologists actually using AI in clinical practice? If so, how?
AI has definitely entered our specialty, based on data from a recent survey of dermatologists about their usage of LLMs, in which a majority (51.7%) reported using AI chatbots on a daily or weekly basis. The vast majority (97%) reported that they had to manually edit their responses prior to use, but a similar proportion (92%) felt that they had major limitations such as privacy or accuracy concerns.34 Nonetheless, 88% said they would continue to use them for various tasks, and none felt that the inaccuracies were so erroneous so as to render them completely unusable.
While many trials of AI algorithms for skin cancer diagnosis have shown that accuracy can be at least as good as that of dermatologists in carefully curated settings, the real test will be how clinicians use these tools in everyday practice. Just like a dermatoscope itself is a tool that is best—or only—used in partnership with a clinician, so too will be the current AI algorithms for clinical diagnosis of skin lesions.
Indeed, a recent systematic review and meta-analysis showed that, while AI assistance improves the accuracy of clinical skin cancer diagnosis for both dermatologists and non-dermatologists, clinician accuracy decreases when given inaccurate AI assistance.35 The influence of AI assistance is less for dermatologists when compared to non-dermatologists, suggesting they are less plied by algorithms, and incorrect AI least influences the most experienced dermatologists.36
This is no surprise: as dermatologists, we go through nearly a decade of post-graduate training carefully honing our clinical intuition. By the end, our reasoning is explainable, interpretable, and traceable. Explainability is the ability to explain our decisions in a layperson-friendly way (eg, describing the clinical morphology of a rash and using that to generate differential diagnoses). Interpretability is the intrinsic ability for our logic to be understandable by others (eg, having formal systems for how we interpret dermatoscopic images, work up retiform purpura, or treat acne). Traceability is the ability to track and document the data, processes, and decisions made throughout our training (eg, our residency case logs, textbooks and curricula, or notes describing our thought processes in patients’ charts). These principles–explainability, interpretability, and traceability–are exactly what so many of the existing AI algorithms lack.37 Concerningly, non-dermatologists or mid-levels, whose dermatologic training also generally lacks these elements, may well settle for an opaque algorithm with little or no pushback on diagnoses that may make no sense. Hopefully, dermatologists and other discerning clinicians will continue to critically question AI tools (whether as standalone devices or adjunctive assistants) if they continue to lack transparency. The wild card, however, is what large private equity-run practices will abide by and if the use of AI will be de rigeur imposed on their “providers” to boost efficiency.
Are non-dermatologists using AI in clinical practice?
Though we don’t have much data on AI usage patterns by non-dermatologists, we know that their accuracy for diagnosing skin cancer is boosted greatly with AI assistance.35 Furthermore, the FDA also recently approved DermaSensor, the first AI-enabled medical device for skin cancer detection specifically intended for use by non-dermatologists.38 It is a physical handheld device slightly larger than an iPhone for which users pay a monthly subscription ($199 per month for five patients, or $399 per month for unlimited users and patients). To image a lesion, a user applies the device tip to the skin five times using elastic scattering spectroscopy. The device outputs a score from one to ten, ranging from “Monitor” at the low end to “Investigate Further” at the high end, with the latter result indicating high concern for malignancy (BCC, SCC, or melanoma) and need to refer to a dermatologist.
DermaSensor’s validation study, DERM-SUCCESS, cited relatively high sensitivity (96.5%) but very low specificity (20.7%) for BCC, SCC, and melanoma (meaning that it detected 96.5% of cancerous lesions selected for testing, but at the cost of a huge amount of false positives).39 A separate study of just melanoma diagnosis, DERM-ASSESS III, demonstrated 95.5% sensitivity and 32.5% specificity.40 This is comparable to NeviSense, a similar FDA-approved AI device, with sensitivity of 95.9% and specificity of 31.3% for assisting dermatologists diagnosing melanoma.41 Looking beyond the summary statistics: the DERM-SUCCESS trial only tested lesions that primary care physicians (PCPs) wanted to biopsy, meaning that this is not a screening tool; nearly all patients were white; it excluded many important edge cases, such as crusted lesions (as SCC often is), ulcerated or eroded lesions (as BCC often is), or patients with six or more lesions concerning for BCC, SCC, or melanoma. In the DERM-ASSESS III trial, of the 44 melanomas biopsied, two melanomas (one MIS and one invasive) were given “monitor” scores when evaluated by the device. Meanwhile, in a clinical utility study of PCPs aiming to mimic real life usage, specificity actually decreased from 60.9% to 54.7% for diagnosis and 44.2% to 32.4% for referrals.42 Among other drawbacks, the device lacks the ability to explain its results.
The study authors cited shortages of dermatologists and lack of access as a need to more effectively triage skin lesions in the primary care setting.39 But such a low specificity also means a similarly low positive predictive value—meaning that even for lesions most concerning to the device, there is less than a 40% chance that they are actually malignant. Even so, the device’s metrics are already likely inflated because only lesions that PCPs wanted to biopsy were selected and so many important types of skin lesions were excluded. It seems doubtful to me that such a device will find much market success when, according to its own website,43 this device does not definitively determine whether or not cancer is present, specify what type of cancer may be present, nor make any decision for the user.” I see this as either resulting in more referrals for benign lesions, or more missed melanomas. To me, this is the latest example of dermatologic AI tools designed without real-world usage in mind.
What does the FDA approval tell us about the future of AI devices in dermatology?
As opposed to NeviSense and MelaFind, AI devices that were approved by the FDA under the pre-market approval pathway (intended for high-risk devices), DermaSensor was approved under the de novo pathway, which requires a lower bar for approval as it is intended for novel devices of low-to-moderate risk.38 A third pathway for FDA approval, 510(k), has been used by an increasing number of AI and machine learning devices as it allows clearance if the device is substantially equivalent to a previously approved (predicate) device. Concerningly, a substantial number of these approvals have been based on predicate devices performing unrelated tasks (eg, a breast MRI device served as predicate reference device for 510(k) approvals of a brain CT device, a diagnostic stress echocardiogram device, and a lung CT device).44 Depending on how the FDA chooses to develop these pathways, we may continue to see approvals of dermatology-adjacent AI devices despite less-than-convincing data. There is real danger in these off-target approvals being relied upon by unknowing practitioners and causing great harm and additional costs to medicine.
Are insurers using AI to deny care?
Yes. A 2023 investigative report showed that UnitedHealth Group’s NaviHealth relied on an AI algorithm to predict length of stay for post-acute care and subsequently restrict payment for its Medicare Advantage patients without any human oversight.45,46 Although AI algorithms are potentially subject to many different forms of bias, this case is particularly illustrative of algorithmic bias, in which an algorithm applied in real-world situations results in unequal outcomes for different population groups. Data from the NaviHealth case revealed that most of the algorithm’s decisions were reversed on appeal, suggesting that the insurer used the AI to broadly deny coverage, and only those with the time and resources to appeal found resolution. On the other hand, clinics are now able to use AI in the reverse, generating prior authorization appeals and other supporting materials with LLMs.47,48
How do patients view AI use?
Interestingly, when doctors and AI chatbots were asked to respond with medical advice to standardized questions, blinded patients rated the AI chatbots as more empathetic.49 But when presented with identically-written medical advice but with the knowledge that one of these was an AI chatbot and the other was a physician, patients perceived recommendations from AI or AI-assisted physicians as less reliable and less empathetic when compared to advice from a human physician alone.50 Patients want to get the best care, but they also don’t want their doctor to be an AI.
What is being done to mitigate some of the current challenges with AI in dermatology?
Like many, I am excited about the potential for AI to one day improve workflows and deliver better care. However, these tools must be designed, tested, published, and marketed in specific ways to avoid creating confusion and leading to more, not less, fragmented care. Since the NaviHealth case was first reported, the Centers for Medicare & Medicaid Services (CMS) have issued a Final Rule limiting the use of algorithms in determining coverage and clarifying that these decisions must be reviewed by a human.51 The AAD’s Augmented Intelligence committee, of which I am a member, is tasked with creating dermatology-specific AI guidelines and standards. Thanks in part to its work, there are now standards for how AI tools are reported,52 how dermatology AI algorithms are created,53 and how industry should engage with the AAD regarding AI.54 More work is still needed to ensure these tools are effective, fair, and true to the same rule that most of us, donning our white coats, recited to begin the practice of medicine: “First, do no harm.”
1. Roose K. A Conversation With Bing’s Chatbot Left Me Deeply Unsettled. The New York Times. Published February 16, 2023. https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html
2. Cade Metz MI, Tripp Mickle, Karen Weise, Kevin Roose. Sam Altman Is Reinstated as OpenAI’s Chief Executive. The New York Times. Published November 22, 2023. https://www.nytimes.com/2023/11/22/technology/openai-sam-altman-returns.html
3. Rennison TMaJ. Nvidia Becomes Most Valuable Public Company, Topping Microsoft. The New York Times. Published June 18, 2024. https://www.nytimes.com/2024/06/18/technology/nvidia-most-valuable-company.html
4. Mickle T. Scarlett Johansson Said No, but OpenAI’s Virtual Assistant Sounds Just Like Her. The New York Times. Published May 20, 2024. https://www.nytimes.com/2024/05/20/technology/scarlett-johannson-openai-voice.html
5. Isaac M. Meta Unveils New Smart Glasses and Headsets in Pursuit of the Metaverse. The New York Times. Published September 25, 2024. https://www.nytimes.com/2024/09/25/technology/meta-products-artificial-intelligence.html
6. Roose K. The Apple Vision Pro Is a Marvel. But Who Will Buy It? The New York Times. Published January 31, 2024. https://www.nytimes.com/2024/01/31/technology/roose-apple-vision-pro.html
7. Esteva A, Kuprel B, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056
8. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842. doi:10.1093/annonc/mdy166
9. Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classification of the Clinical Images for Benign and Malignant Cutaneous Tumors Using a Deep Learning Algorithm. J Invest Dermatol. Jul;138(7):1529-1538. doi: 10.1016/j.jid.2018.01.028
10. Zhou Y, Wang M, Zhao S, Yan Y. Machine Learning for Diagnosis of Systemic Lupus Erythematosus: A Systematic Review and Meta-Analysis. Comput Intell Neurosci. 2022;2022:7167066. doi: 10.1155/2022/7167066
11. Zhang AJ, Lindberg N, et al. Development of an artificial intelligence algorithm for the diagnosis of infantile hemangiomas. Pediatr Dermatol. 2022;39(6):934-936. doi: 10.1111/pde.15149. Epub 2022 Sep 27.
12. Zhang J, Qiu Y, Peng L, Zhou Q, Wang Z, Qi M. A comprehensive review of methods based on deep learning for diabetes-related foot ulcers. Front Endocrinol (Lausanne). 2022;13:945020. doi: 10.3389/fendo.2022.945020.
13. Han SS, Park GH, et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS One. 2018;13(1):e0191493. doi: 10.1371/journal.pone.0191493.
14. Andres C, Andres-Belloni B, et al. iDermatoPath - a novel software tool for mitosis detection in H&E-stained tissue sections of malignant melanoma. J Eur Acad Dermatol Venereol. 2017;31(7):1137-1147. doi: 10.1111/jdv.14126. Epub 2017 Feb 21.
15. De Logu F, Ugolini F, et al. Recognition of Cutaneous Melanoma on Digitized Histopathological Slides via Artificial Intelligence Algorithm. Front Oncol. 2020;10:1559. doi: 10.3389/fonc.2020.01559.
16. Hekler A, Utikal JS, et al. Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. Eur J Cancer. 2019;118:91-96. doi: 10.1016/j.ejca.2019.06.012. Epub 2019 Jul 18.
17. Ba W, Wang R, et al. Diagnostic assessment of deep learning for melanocytic lesions using whole-slide pathological images. Transl Oncol. 2021;14(9):101161. doi: 10.1016/j.tranon.2021.101161. Epub 2021 Jun 27.
18. Hekler A, Utikal JS, et al. Pathologist-level classification of histopathological melanoma images with deep neural networks. Eur J Cancer. 2019;115:79-83. doi: 10.1016/j.ejca.2019.04.021. Epub 2019 May 23.
19. Brinker TJ, Schmitt M, et al. Diagnostic performance of artificial intelligence for histologic melanoma recognition compared to 18 international expert pathologists. J Am Acad Dermatol. 2022;86(3):640-642. doi: 10.1016/j.jaad.2021.02.009. Epub 2021 Feb 11.
20. Jiang YQ, Xiong JH, et al. Recognizing basal cell carcinoma on smartphone-captured digital histopathology images with a deep neural network. Br J Dermatol. 2020;182(3):754-762. doi: 10.1111/bjd.18026. Epub 2019 Aug 22.
21. Le’Clerc Arrastia J, Heilenkötter N, et al. Deeply Supervised UNet for Semantic Segmentation to Assist Dermatopathological Assessment of Basal Cell Carcinoma. J Imaging. 2021; 7(4):71. doi: 10.3390/jimaging7040071
22. van Zon MCM, van der Waa JD, Veta M, Krekels GAM. Whole-slide margin control through deep learning in Mohs micrographic surgery for basal cell carcinoma. Exp Dermatol. May 2021;30(5):733-738. doi:10.1111/exd.14306
23. Campanella G, Nehal KS, Lee EH, et al. A deep learning algorithm with high sensitivity for the detection of basal cell carcinoma in Mohs micrographic surgery frozen sections. J Am Acad Dermatol. Nov 2021;85(5):1285-1286. doi:10.1016/j.jaad.2020.09.012
24. Sohn GK, Sohn JH, Yeh J, Chen Y, Brian Jiang SI. A deep learning algorithm to detect the presence of basal cell carcinoma on Mohs micrographic surgery frozen sections. J Am Acad Dermatol. May 2021;84(5):1437-1438. doi:10.1016/j.jaad.2020.06.080
25. Jansen P, Baguer DO, Duschner N, et al. Evaluation of a Deep Learning Approach to Differentiate Bowen’s Disease and Seborrheic Keratosis. Cancers (Basel). Jul 20 2022;14(14)doi:10.3390/cancers14143518
26. Decroos F, Springenberg S, Lang T, et al. A Deep Learning Approach for Histopathological Diagnosis of Onychomycosis: Not Inferior to Analogue Diagnosis by Histopathologists. Acta Derm Venereol. Aug 31 2021;101(8):adv00532. doi:10.2340/00015555-3893
27. Jansen P, Creosteanu A, Matyas V, et al. Deep Learning Assisted Diagnosis of Onychomycosis on Whole-Slide Images. J Fungi (Basel). Aug 28 2022;8(9)doi:10.3390/jof8090912
28. Jackson CR, Sriharan A, Vaickus LJ. A machine learning algorithm for simulating immunohistochemistry: development of SOX10 virtual IHC and evaluation on primarily melanocytic neoplasms. Mod Pathol. Sep 2020;33(9):1638-1648. doi:10.1038/s41379-020-0526-z
29. O’Hern K, Yang E, Vidal NY. ChatGPT underperforms in triaging appropriate use of Mohs surgery for cutaneous neoplasms. JAAD Int. Sep 2023;12:168-170. doi:10.1016/j.jdin.2023.06.002
30. Shoham G, Berl A, Shir-Az O, Shabo S, Shalom A. Predicting Mohs surgery complexity by applying machine learning to patient demographics and tumor characteristics. Exp Dermatol. Jul 2022;31(7):1029-1035. doi:10.1111/exd.14550
31. Woodfin MW, Flint N, Nguyen QD. ChatGPT Effectively Triages Real-World Neoplasms Using Mohs Appropriate Use Criteria. Dermatol Surg. Nov 7 2024;doi:10.1097/dss.0000000000004487
32. Cuellar-Barboza A, Brussolo-Marroquín E, Cordero-Martinez FC, Aguilar-Calderon PE, Vazquez-Martinez O, Ocampo-Candiani J. An evaluation of ChatGPT compared with dermatological surgeons’ choices of reconstruction for surgical defects after Mohs surgery. Clin Exp Dermatol. Oct 24 2024;49(11):1367-1371. doi:10.1093/ced/llae184
33. Jairath N, Manduca S, Que SKT. ReconGPT: A novel artificial intelligence tool and its potential use in post-Mohs reconstructive decision-making. Journal of the American Academy of Dermatology. 2024/08/31/ 2024;doi:https://doi.org/10.1016/j.jaad.2024.08.048
34. Gui H, Rezaei SJ, Schlessinger D, et al. Dermatologists’ Perspectives and Usage of Large Language Models in Practice: An Exploratory Survey. Journal of Investigative Dermatology. 2024;144(10):2298-2301. doi:10.1016/j.jid.2024.03.028
35. Krakowski I, Kim J, Cai ZR, et al. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis. NPJ Digit Med. Apr 9 2024;7(1):78. doi:10.1038/s41746-024-01031-w
36. Tschandl P, Rinner C, Apalla Z, et al. Human-computer collaboration for skin cancer recognition. Nat Med. Aug 2020;26(8):1229-1234. doi:10.1038/s41591-020-0942-0
37. Haggenmüller S, Maron RC, Hekler A, et al. Patients’ and dermatologists’ preferences in artificial intelligence-driven skin cancer diagnostics: A prospective multicentric survey study. J Am Acad Dermatol. Aug 2024;91(2):366-370. doi:10.1016/j.jaad.2024.04.033
38. Venkatesh KP, Kadakia KT, Gilbert S. Learnings from the first AI-enabled skin cancer device for primary care authorized by FDA. npj Digital Medicine. 2024/06/15 2024;7(1):156. doi:10.1038/s41746-024-01161-1
39. Merry S, McCormick B, Nguyen V, Chatha V, Croghan I, Leffell D. Clinical Performance of Novel Elastic Scattering Spectroscopy (ESS) in Detection of Skin Cancer: A Blinded, Prospective, Multi-Center Clinical Trial. J Clin Aesthet Dermatol. Apr 2023;16
40. Hartman RI, Trepanowski N, Chang MS, et al. Multicenter prospective blinded melanoma detection study with a handheld elastic scattering spectroscopy device. JAAD Int. Jun 2024;15:24-31. doi:10.1016/j.jdin.2023.10.011
41. Malvehy J, Hauschild A, Curiel-Lewandrowski C, et al. Clinical performance of the Nevisense system in cutaneous melanoma detection: an international, multicentre, prospective and blinded clinical trial on efficacy and safety. The British journal of dermatology. Nov 2014;171(5):1099-107. doi:10.1111/bjd.13121
42. Seiverling E, Agresta T, Cyr P, et al. Clinical Utility of an Elastic Scattering Spectroscopy Device in Assisting Primary Care Physicians’ Detection of Skin Cancers. https://www.dermasensor.com/wp-content/uploads/80-0014.2-Reader-Study.pdf
43. The Future of Skin Cancer Detection, Now FDA Cleared. https://www.dermasensor.com/#howworks
44. Muehlematter UJ, Bluethgen C, Vokinger KN. FDA-cleared artificial intelligence and machine learning-based medical devices and their 510(k) predicate networks. The Lancet Digital Health. 2023;5(9):e618-e626. doi:10.1016/S2589-7500(23)00126-7
45. Mello MM, Rose S. Denial—Artificial Intelligence Tools and Health Insurance Coverage Decisions. JAMA Health Forum. 2024;5(3):e240622-e240622. doi:10.1001/jamahealthforum.2024.0622
46. Denied by AI: How Medicare Advantage plans use algorithms to cut off care for seniors in need. STAT. March 13, 2023. https://www.statnews.com/2023/03/13/medicare-advantage-plans-denial-artificial-intelligence
47. Diane A, Gencarelli P, Jr., Lee JM, Jr., Mittal R. Utilizing ChatGPT to Streamline the Generation of Prior Authorization Letters and Enhance Clerical Workflow in Orthopedic Surgery Practice: A Case Report. Cureus. Nov 2023;15(11):e49680. doi:10.7759/cureus.49680
48. Rosenbluth T. In Constant Battle With Insurers, Doctors Reach for a Cudgel: A.I. The New York Times. July 10, 2024. https://www.nytimes.com/2024/07/10/health/doctors-insurers-artificial-intelligence.html
49. Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Internal Medicine. 2023;183(6):589-596. doi:10.1001/jamainternmed.2023.1838
50. Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nature Medicine. 2024/11/01 2024;30(11):3098-3100. doi:10.1038/s41591-024-03180-7
51. Medicare Program; Contract Year 2024 Policy and Technical Changes to the Medicare Advantage Program, Medicare Prescription Drug Benefit Program, Medicare Cost Plan Program, and Programs of All-Inclusive Care for the Elderly. https://www.federalregister.gov/documents/2023/04/12/2023-07115/medicare-program-contract-year-2024-policy-and-technical-changes-to-the-medicare-advantage-program
52. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digit Health. Oct 2020;2(10):e537-e548. doi:10.1016/s2589-7500(20)30218-1
53. Daneshjou R, Barata C, Betz-Stablein B, et al. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines From the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatology. 2022;158(1):90-96. doi:10.1001/jamadermatol.2021.4915
54. Lee I, Aninos A, Lester J, et al. Engaging industry effectively and ethically in artificial intelligence from the Augmented Artificial Intelligence Committee Standards Workgroup. J Am Acad Dermatol. Aug 2024;91(2):312-314. doi:10.1016/j.jaad.2024.03.036
Ready to Claim Your Credits?
You have attempts to pass this post-test. Take your time and review carefully before submitting.
Good luck!