2026 American Academy of Neurology Abstract Website

Rocío Gómez¹, Aleix Solanes², Enric Monreal Laguillo³, Maria Sepúlveda⁴, Ángel Pérez Sempere⁵, Miguel Angel Hernandez Perez⁶, Juan Pablo Cuello⁷, Gary Álvarez-Bravo⁸, Eduardo Aguera Morales⁹, Javier Riancho¹⁰, Elena García-Arcelay¹, Jorge Maurino¹, Gustavo Saposnik¹¹
¹Medical Department, Roche Farma S.A., ²Institut d'Investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), Barcelona, Spain, ³Department of Neurology, Hospital Universitario Ramón y Cajal, Madrid, Spain, ⁴Department of Neurology, Hospital Clínic de Barcelona, Barcelona, Spain, ⁵Department of Neurology, Hospital General Universitario de Alicante, Alicante, Spain, ⁶Department of Neurology, Hospital Nuestra Senora de Candelaria, Tenerife, Spain, ⁷Department of Neurology, Hospital Universitario Gregorio Marañón, Madrid, Spain, ⁸Department of Neurology, Hospital Universitari de Girona Dr. Josep Trueta, Girona, Spain, ⁹Department of Neurology, Hospital Universitario Reina Sofía, Córdoba, Spain, ¹⁰Department of Neurology, Hospital Sierrallana-Institute of Research Valdecilla (IDIVAL), Torrelavega, Spain, ¹¹Department of Neurology, St. Michael´s Hospital, University of Toronto, Toronto, Canada

Objective:

This study's aim was to compare therapeutic decisions made by neurologists vs. ChatGPT based on best practice guidelines.

Background:

Artificial intelligence (AI) tools can help doctors in clinical decision-making by processing complex information into coherent, evidence-based recommendations. A major challenge in managing neurological disorders is therapeutic inertia (TI), defined as the lack of treatment initiation or intensification when therapeutic goals are unmet.

Design/Methods:

Three cross-sectional online studies were conducted with the Spanish Society of Neurology. Participating neurologists completed 20 simulated case-scenarios on multiple sclerosis (MS) [DiscutirMS], use of neurofilament light chain (sNfL) levels [NewFeeLs-MS study], and neuromyelitis optica spectrum disorder (NMOSD) [PREFERENCES-NMOSD study]. Each vignette was also presented to ChatGPT-4o under two conditions: with and without an explicit guideline prompt (context). The primary outcome was a guideline-concordant recommendation response. Multivariate logistic regression analyses were used to evaluate neurologists' performance vs ChatGPT.

Results:

A total of 290 neurologists participated (n=96 in Discutir MS, n=116 in NewFeeLs-MS, and n=78 in PREFERENCES-NMOSD). Participants’ characteristics were similar across studies [mean age: 40.2±9.9 years, 150 (51.7%) male, mean years of experience: 14.1±9.6, 174 (60%) had an MS-specific consultation, median MS patients attended/week (IQR): 15 (8-25)].

Overall, ChatGPT demonstrated superior accuracy: 80.5% ± 40.1 with context; 72.9% ± 42.8 without context, compared to neurologists (66.5% ± 35.0, p=0.001).

When introducing into the model the common patient characteristics in the simulated case-scenarios, ChatGPT had greater odds of following the latest guidelines (lower TI) vs neurologists, either with context (OR=3.59, 95%CI:2.24-5.57, p<0.0001) or without context (OR=1.68, 95%CI:1.06-2.66, p=0.03) compared to neurologist’s decisions.

Conclusions:

ChatGPT outperformed neurologists in simulated MS and NMOSD scenarios, showing higher adherence to guidelines and reduced TI. It is consistent with or without explicit guideline prompts, highlighting AI’s potential to support timely treatment decisions. Incorporating such tools into practice and training may accelerate best‑practice adoption and improve patient outcomes.