Outsmarting Therapeutic Inertia: Can ChatGPT-4o Beat Neurologists in Multiple Sclerosis and Neuromyelitis Optica Spectrum Disorder Care?
Rocío Gómez1, Aleix Solanes2, Enric Monreal Laguillo3, Maria Sepúlveda4, Ángel Pérez Sempere5, Miguel Angel Hernandez Perez6, Juan Pablo Cuello7, Gary Álvarez-Bravo8, Eduardo Aguera Morales9, Javier Riancho10, Elena García-Arcelay1, Jorge Maurino1, Gustavo Saposnik11
1Medical Department, Roche Farma S.A., 2Institut d'Investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), Barcelona, Spain, 3Department of Neurology, Hospital Universitario Ramón y Cajal, Madrid, Spain, 4Department of Neurology, Hospital Clínic de Barcelona, Barcelona, Spain, 5Department of Neurology, Hospital General Universitario de Alicante, Alicante, Spain, 6Department of Neurology, Hospital Nuestra Senora de Candelaria, Tenerife, Spain, 7Department of Neurology, Hospital Universitario Gregorio Marañón, Madrid, Spain, 8Department of Neurology, Hospital Universitari de Girona Dr. Josep Trueta, Girona, Spain, 9Department of Neurology, Hospital Universitario Reina Sofía, Córdoba, Spain, 10Department of Neurology, Hospital Sierrallana-Institute of Research Valdecilla (IDIVAL), Torrelavega, Spain, 11Department of Neurology, St. Michael´s Hospital, University of Toronto, Toronto, Canada
Objective:

This study's aim was to compare therapeutic decisions made by neurologists vs. ChatGPT based on best practice guidelines. 

Background:
Artificial intelligence (AI) tools can help doctors in clinical decision-making by processing complex information into coherent, evidence-based recommendations. A major challenge in managing neurological disorders is therapeutic inertia (TI), defined as the lack of treatment initiation or intensification when therapeutic goals are unmet. 
Design/Methods:
Three cross-sectional online studies were conducted with the Spanish Society of Neurology. Participating neurologists completed 20 simulated case-scenarios on multiple sclerosis (MS) [DiscutirMS], use of neurofilament light chain (sNfL) levels [NewFeeLs-MS study], and neuromyelitis optica spectrum disorder (NMOSD) [PREFERENCES-NMOSD study]. Each vignette was also presented to ChatGPT-4o under two conditions: with and without an explicit guideline prompt (context). The primary outcome was a guideline-concordant recommendation response. Multivariate logistic regression analyses were used to evaluate neurologists' performance vs ChatGPT.
Results:

A total of 290 neurologists participated (n=96 in Discutir MS, n=116 in NewFeeLs-MS, and n=78 in PREFERENCES-NMOSD). Participants’ characteristics were similar across studies [mean age: 40.2±9.9 years, 150 (51.7%) male, mean years of experience: 14.1±9.6, 174 (60%) had an MS-specific consultation, median MS patients attended/week (IQR): 15 (8-25)].

Overall, ChatGPT demonstrated superior accuracy: 80.5% ± 40.1 with context; 72.9% ± 42.8 without context, compared to neurologists (66.5% ± 35.0, p=0.001).

When introducing into the model the common patient characteristics in the simulated case-scenarios, ChatGPT had greater odds of following the latest guidelines (lower TI) vs neurologists, either with context (OR=3.59, 95%CI:2.24-5.57, p<0.0001) or without context (OR=1.68, 95%CI:1.06-2.66, p=0.03) compared to neurologist’s decisions.
Conclusions:
ChatGPT outperformed neurologists in simulated MS and NMOSD scenarios, showing higher adherence to guidelines and reduced TI. It is consistent with or without explicit guideline prompts, highlighting AI’s potential to support timely treatment decisions. Incorporating such tools into practice and training may accelerate best‑practice adoption and improve patient outcomes.
10.1212/WNL.0000000000215042
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.