Multimodal "Toolkit" Enhances AI Detection of Skin Lesions: Researchers

07/22/2025

Key Takeaways

  • The open-access data set MILK10k offers more than 10,000 image pairs of dermatoscopic and clinical views for 48 diagnosis categories aligned with ISIC-DX, according to a new paper in the Journal of Investigative Dermatology.

  • Diversity in skin tone and lesion type, including underrepresented non-pigmented skin cancers, are prominently represented.

A new open-access dataset, MILK10k, provides over 10,000 multimodal images to improve machine learning performance in the diagnosis of both pigmented and non-pigmented skin cancers and simulators.

According to researchers publishing the paper in the Journal of Investigative Dermatology, the MILK10k dataset includes 10,480 images from 5,240 lesions, each paired with both close-up and dermatoscopic views. More than 95% of cases were histopathologically confirmed. Images were retrospectively collected from five global centers in Austria, Turkey, the U.S., and community clinics in North Macedonia and Australia.

The dataset spans 48 ISIC-DX-aligned diagnoses, covering neoplastic and non-neoplastic lesions. The researchers said the dataset also features annotations for clinically relevant features like pigmentation, ulceration, erythema, and vessels. Skin tones were algorithmically categorized into clusters.

“Significantly higher pigmentation values were observed in melanomas and nevi compared to other diagnostic classes,” the authors added. “In contrast, erythema values were highest in inflammatory lesions and basal cell carcinomas (BCCs).”

The paper authors tested an associated machine learning pipeline using hierarchical classification with both dermatoscopic and clinical images, yielding top-1 accuracy of 53.6% and specificity of 96%. A hierarchical error metric based on the ISIC-DX ontology was also proposed to provide more clinically relevant model assessment beyond standard accuracy scores.

“Traditional performance metrics may be less informative when dealing with large sets of diagnostic categories across multiple levels of granularity,” the authors wrote. “We propose a metric based on the average distance between predicted and true diagnoses, measured as steps within a structured classification hierarchy.”

Source:Philipp T, et al. J Invest Dermatol. 2025. doi:10.1016/j.jid.2025.06.1594

Register

We're glad to see you're enjoying PracticalDermatology…
but how about a more personalized experience?

Register for free