Published: May 20, 2021

Objective Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies.

Methods A machine learning model was developed to estimate an individual patient’s SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman’s R value and Pearson’s R value.

Results The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman’s R value of 0.7 and a Pearson’s R value of 0.7 when calculated using the four disease activity categories.

Conclusion The model performs well when estimating SLEDAI score categories using unstructured clinical notes.