Machine Learning Model Flags HS Up to Two Years Before Diagnosis: Report

04/22/2025
Globe with medical icons heart rate symbols and data graphs representing global healthcare technology

A machine learning model analyzing U.S. insurance claims data was able to predict hidradenitis suppurativa (HS) up to two years before formal diagnosis, a new study shows.

"Because the individual symptoms of cysts and abscesses, particularly in the early stages of the disease, mimic those seen in more common conditions, lesions are often misdiagnosed as boils or furunculosis, and patients may wait up to 10 years to receive an actual diagnosis of HS," the authors wrote. "Other causes of diagnostic delay include the stigmatizing nature of the disease, lack of awareness among patients and healthcare providers, and lack of access to a dermatologist."

With the goal of shortening time to diagnosis in mind, the authors developed and evaluated multiple predictive models using 13.5 years of insurance data from 5.9 million individuals; this sample included 13,886 patients with HS and 69,428 matched controls. The study authors tested three machine learning algorithms (logistic regression, random forest, and XGBoost) against a clinician-informed baseline model to identify patients likely to develop HS based on prior healthcare interactions.

The XGBoost model outperformed all others, achieving an area under the receiver operating characteristic curve (AUC) of 0.80 vs. 0.71 for the baseline model. XGBoost achieved an AUC of 0.81, F1-score of 0.47, and recall of 0.75 in a held-out test set. The top predictors included age, sex, obesity, and prescriptions for clindamycin phosphate and sulfamethoxazole/trimethoprim.

The authors noted limitations that included the absence of race/ethnicity data, as well as unstructured clinical details such as lesion location. 

"We emphasize that the aim of this exploratory study is not the development of a clinical diagnostic tool, a goal that would need further validation of the approach in diverse data sources as well as model calibration to ensure acceptable false-positive and false-negative rates in the intended target population," the researchers wrote in the discussion portion of the analysis. "Still, as an initial proof-of-concept exercise, our findings demonstrate the potential to detect HS earlier using decision-support algorithms developed with longitudinal healthcare data such as medical insurance claims."

Source: Ali W, et al. JID Innovations. 2025. doi:10.1016/j.xjidi.2025.100362 

Register

We're glad to see you're enjoying PracticalDermatology…
but how about a more personalized experience?

Register for free