2024 American Academy of Neurology Abstract Website

Objective:

To evaluate the performance of ChatGPT in answering neurology board-styled questions

Background:

Artificial intelligence (AI) models like ChatGPT have gained prominence in various professional fields, including healthcare. To further study the possible utility of this novel tool in a healthcare setting, we evaluated the performance of ChatGPT in answering neurology board-styled questions.

Design/Methods:

Neurology board-style questions were accessed from Board Vitals, a commercial neurology question bank. ChatGPT (GPT4 via Microsoft Bing Chat) was provided the full question prompt and answer choices. First attempts and additional attempts of up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT’s inability to process visual input. The AI answers were then compared to human user data provided by the question bank to gauge its performance.

Results:

Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first iteration and 383 (75.3%) over three iterations, translating to approximately the 26^th and 50^th percentile respectively. The highest performing subjects were Pain (100%), Epilepsy & Seizures (85%), and Genetic (82%) while the lowest performing subjects were Imaging/Diagnostic Studies (27%), Critical Care (41%), and Cranial Nerves (48%).

Conclusions:

This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with subsequent question iterations and performance was within the expected range of neurology learners. Here we demonstrate ChatGPT’s potential in processing specialized medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making.