Evaluating the Accuracy of Large Language Models in Stroke Management
Sahil Suvarna1, Karim Makhoul1, Ahmad Shamulzai1, Angela Xia1, Gregory Kurgansky1, Richard Libman1
1Northwell Health
Objective:
We aim to evaluate the accuracy of large language artificial intelligence models (LLM-AI) in acute and inpatient stroke management.
Background:
Artificial intelligence is currently being investigated as a tool for medical decision making. Imaging analysis AI software is used in stroke analysis (e.g. Rapid AI, Brainomix, VizAI, Aidoc... etc) as an adjunct to medical decision making in acute stroke patients. LLM-AI (ChatGPT, DoximityGPT, OpenEvidence) have been used as part of research studies to help execute medical decisions. However, their accuracy in stroke diagnosis and management has not been systematically assessed.
Design/Methods:
We provided three different LLM (ChatGPT, DoximityGPT, OpenEvidence) 25 real-life  stroke scenarios (5 cases for each of the 5 stroke mechanisms defined by the TOAST criteria). Cases were collected from October 2024 to September 2025 and LLM prompts were collected from 9/20/25-10/1/25 using the most recent LLM models. Criteria like adherence to guidelines, identification of mechanism and treatment decisions were used to evaluate accuracy of LLM-AI in medical decision making. AI responses were compared to decisions provided by two neurologists. 
Results:
Accuracy in identifying stroke mechanism ranged between 60-76%, being highest for large artery atherosclerosis and lowest for embolic stroke of undetermined source. Accurate acute stroke management decision was lower than expected and LLM-AI answers were correct in about 44-68% with OpenEvidence providing the most accurate decision making (68%). Overall adherence to guidelines ranged from 60-74% with ChatGPT achieving the highest overall adherence (74%).
Conclusions:
LLM-AI demonstrated moderate accuracy in stroke management. Future work should explore integration with clinical workflows as an adjunctive tool. 
10.1212/WNL.0000000000215348
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.