To apply large language models (LLM) to accurately and efficiently extract key multiple sclerosis (MS) metrics from an institutional EHR.
The electronic health record (EHR) provides remarkable opportunities to understand the evolution of chronic conditions such as MS - including across heterogenous treatment, sociodemographic and clinical landscapes. Efforts to date to extract MS-specific metrics have been stymied by the lack of workflow-friendly structured discrete data entry. Large language models (LLM) to classify narrative reports in the EHR as discrete data could provide significant benefits for MS research.
An institutional ecosystem that securely connects healthcare data with ChatGPT4 was applied to clinical MS neurologist notes in a single institutional EHR. Three prompts were developed and iteratively refined to classify (a) visit-level Expanded Disability Status Scale (EDSS), Timed 25 Foot Walk (T25FW), MS DMT and MS type and (b) patient-level MS history (dates of symptom onset and diagnosis). Each prompt output was validated against a manually annotated dataset (N=2500 neurologist notes).
In total, the combined prompts processed 122,790 MS Progress Notes (N=12,134 patients) authored between 11-2012 and 02-2025 in approximately 186 minutes.
When compared to manual coding, the prompt output performed with high accuracy (>99%) for visit-level T25FW, EDSS, DMT and onset year. There was heterogeneity in clinician categorization of MS type, likely accounting for the lower accuracy of this metric (93%).
As proof-of-concept analysis of extracted data, DMT utilisation over DMT eras (2010 to 2024) showed expected changes: declining use of first—line self-injectables and increasing use of B cell depleting therapies.
LLM enabled extraction of highly accurate key MS disease metrics. These metrics can further be paired with output from previously validated LLM prompts (MRI activity on radiology reports, depression indicators in neurologist notes). Together, these offer an efficient, low-cost and scalable approach to analysing clinical trajectories in heterogeneous, real-world MS cohorts.