Development and Validation of Machine Learning and Electronic Medical Records-Based Characterization of Stiff Person Syndrome
Soo Hwan Park1, Seo Ho (Michael) Song2
1Department of Neurology, Dartmouth Hitchcock Medical Center, 2Department of Psychiatry, Beth Israel Deacones Medical Center
Objective:

To develop and validate an objective framework in identifying medical history items that best characterize the diagnosis of Stiff Person Syndrome (SPS).

Background:

SPS is a rare neurologic disorder with characteristic muscle spasms and painful rigidity. While electronic medical records may contain items that can help refine its diagnostic criteria, the rarity of SPS leads to statistical underpowering in conventional tests. A machine learning approach that leverages underutilized electronic medical records may reveal clinical features that are strongly associated with the diagnosis of SPS.

Design/Methods:

This retrospective cohort study assessed 23 patients carrying SPS diagnoses and 25 controls who were all anti-GAD positive and were treated at Dartmouth Hitchcock Medical Center. After binarizing the 319 clinical features, we employed an iterative machine-learning-based feature selection approach (Contribution Selection Algorithm) to identify variables that best discriminate between patients with SPS vs. anti-GAD positive controls. Each iteration generated SHapley Additive exPlanation (SHAP) values for each binarized feature. Classifier accuracy, AUC, and Matthews Correlation Coefficients (MCC) for the Support Vector Machine models were calculated using repeated stratified 4-fold cross-validation.

Results:

The feature selection algorithm identified depression, pain in joint, hypothyroidism, and GERD as the top 4 predictors of SPS, as the SVM model that employed these features achieved the best predictive performance with a binary classification accuracy of 0.775 (95%CI, 0.759-0.790), an AUC of 0.808 (95%CI, 0.791–0.825), and a MCC of 0.565 (95%CI, 0.534-0.597).

Conclusions:

Our machine learning-based architecture identified features that are strongly associated with SPS. By circumventing the limitations of statistical underpowering and focusing on identifying associations through induction, this approach may act as a powerful tool for generating new hypotheses for rare disorders. This framework complements hypothesis-driven investigations, together forming a closed loop approach to aid diagnostic challenges in neurology.

10.1212/WNL.0000000000203251