Identifying Multiple Sclerosis Relapses from Clinical Notes Using Combined Rule-based and Deep Learning Methodologies
Iris Chin1, Heather Moss2, Kathryn Sands1, Manan Kocher1, Oguz Akkas1, Aracelis Torres1
1Verana Health, 2Stanford University

To develop an algorithm to extract multiple sclerosis (MS) relapse events from clinical notes from the American Academy of Neurology Axon Registry, a neurology-specific patient registry that uses real-world electronic health record (EHR) data. 


Relapse frequency is a key outcome measure for MS patients, indicating disease activity. MS relapses are often documented in unstructured clinical notes rather than structured fields in the EHR, and mechanisms to automatically extract this information will better enable real-world evidence (RWE) studies.


As of May 2022, 46,600 MS patients were identified using structured ICD codes and mention of MS in clinical notes from 3.2 million patients and their 24 million patient visits in the Axon Registry. A combined rule-based, deep learning (DL) approach was developed to classify, at a given encounter, the relapse status (current relapse, no relapse, discussion of past relapse only, or unknown) of these patients.

A thousand notes were randomly sampled from MS patient notes containing relapse phrases, identified via string searches. These notes were labeled by a clinical expert for their relapse statuses to generate training, validation, and testing sets (70-15-15 split). Using the training and validation sets, a DL model was trained to classify notes into one of the relapse statuses. Performance was assessed on the test set.


The model had an overall accuracy of 0.88. For current relapse status, a sensitivity of 0.83 and specificity of 0.97 were achieved.

Identified MS patients averaged 0.58+/-0.55 relapses/year and the proportion of patients with relapses decreased over time (2014: 14.74% vs. 2021: 9.86%), consistent with clinical expectations.


We used a combined rule-based, DL methodology to extract relapses from clinical notes. Performance metrics and clinically consistent patterns found in the results provide confidence to support using this scalable algorithm for RWE studies.