2026 American Academy of Neurology Abstract Website

Objective:

We aimed to evaluate whether AI can match the clinical reasoning abilities of neurologists on actual board certification questions that mirror real-world diagnostic challenges.

Background:

Board certification exams test integrating complex clinical findings, recognizing patterns, and making diagnostic decisions under uncertainty. The most challenging cases require distinguishing between conditions with overlapping presentations, like differentiating myasthenia gravis from Lambert-Eaton syndrome in cancer patients, or recognizing atypical movement disorders that mimic multiple conditions. We tested if utilization current sophisticated AI could handle this level of clinical complexity across subspecialties where even experienced neurologists may struggle outside their primary expertise.

Design/Methods:

We utilized 305 actual questions from recent neurology board exams covering all major subspecialties - movement disorders, neuromuscular disease, vascular neurology, neuroimmunology, epilepsy, neuro-oncology, and behavioral neurology. The most challenging cases involved temporal pattern recognition (symptoms that fluctuate throughout the day), multi-system presentations requiring anatomical localization, and rare conditions with subtle diagnostic clues. We tested several AI approaches, including one designed to work more like an experienced neurologist by systematically analyzing cases, gathering relevant knowledge, and validating conclusions before making diagnostic decisions.

Results:

Standard AI models performed inconsistently, particularly struggling with cases requiring integration of multiple clinical concepts across subspecialties. However, our systematic approach dramatically improved performance, jumping from 69.5% to 89.2% on challenging questions. The system excelled at complex diagnostic scenarios - recognizing neurosarcoidosis with longitudinal myelitis, diagnosing Wilson's disease in atypical presentations, and identifying autoimmune encephalitis with psychiatric symptoms. Most importantly, it maintained consistent performance across subspecialties including neuroimmunology, genetic neurology, and neurophthalmology, areas where generalists often struggle with specialized diagnostic criteria.

Conclusions:

We showed AI systems can achieve neurologist-level performance on complex diagnostic reasoning when designed to mirror how experienced clinicians actually think through cases. This suggests real potential for AI assistance in challenging diagnoses, particularly in areas where subspecialty expertise may be limited.