AI System Outperforms Human Neurologists in Diagnostic Accuracy While Reducing Healthcare Costs and Time
Moran Sorka1, Shahar Shelly2, Dvir Aran1, Alon Gorenshtein2, Pannathat Soontrapa3, Hillel Abramovitch2
1Technion, 2Rambam Medical Center, 3Siriraj Hospital
Objective:

We aimed to evaluate specialized multi-agent system utilization is non inferior or exceed neurologist performance in complex diagnostic scenarios while optimizing resource utilization and diagnostic efficiency

Background:

With fewer than 18,000 practicing neurologists serving over 340 million Americans, there is urgent need for AI systems that can provide expert-level diagnostic capability while addressing healthcare economic pressures. We developed a novel platform simulating realistic clinical workflows to evaluate AI diagnostic performance beyond traditional accuracy metrics

Design/Methods:

We created a platform enabling interactive diagnostic sessions between healthcare providers and an AI with access to complete case information. Sixteen neurological cases of varying complexity were curated from peer-reviewed sources spanning multiple subspecialties. We compared 14 neurologists against language models and Gregory which integrating information theory with clinical reasoning by quantifing each diagnostic test's value using information gain, cost-benefit analysis, and strategic prioritization. Primary outcomes included diagnostic accuracy, procedural costs (CPT codes), and time to diagnosis using actual procedure durations

Results:

Human neurologists achieved 81% diagnostic accuracy (79% residents, 88% attendings) while Gregory achieved perfect diagnostic accuracy (100%). Gregory demonstrated superior performance in challenging cases such arrhythmogenic cardiomyopathy presenting with neurological symptoms (where all human participants failed), complex neuroimmunologic conditions like neurosarcoidosis, and movement disorders requiring systematic exclusion. Gregory utilized targeted neurological procedures (nerve conduction studies, evoked potentials) in 47% of cases versus 23-41% for human providers, while requiring fewer broad imaging studies. Cost analysis revealed Gregory averaged $1,423 per case compared to $3,041 for human neurologists (p=0.008), with significantly faster diagnosis (23 vs 43 days, p=0.002)

Conclusions:

We showed our agent can significantly outperform neurologists in diagnostic accuracy while reducing healthcare costs by 53% and diagnostic time by 47%. These findings suggest potential for AI-assisted diagnosis to address neurologist shortages while improving diagnostic efficiency, though careful clinical validation and human-AI collaboration frameworks remain essential for safe implementation

10.1212/WNL.0000000000212751
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.