ChatGPT as Clinical Decision-Making Tool for Headache Screening
Aizara Ermekbaeva1, Tse Chiang Chen2, Mei Yang3, Amy Urbina Pacheco1, Alireza Shirazian4, Nimmi Wickramasuriya5, Jonathan Rodriguez6, Michele Longo1
1Tulane School of Medicine, 2Northwestern University, 3Tulane University School of Medicine, 4University of Chicago, 5Tulane University Hospital and Clinics, 6Tulane Medical Center
Objective:

This study evaluates ChatGPT-4’s ability to differentiate between primary and secondary headaches using the SNNOOP10 criteria.

Background:

Headaches are a prevalent issue, accounting for 3.5 million annual emergency department (ED) visits in the United States. They are classified into primary and secondary headaches, with secondary headaches potentially indicating serious underlying conditions. The SNNOOP10 mnemonic helps screen for red flags suggesting secondary headaches. Artificial Intelligence (AI) algorithms have shown promise in assisting with patient triage in EDs and primary care settings.

Design/Methods:

Two ChatGPT-4 AI chatbots, Microsoft Bing Copilot and OpenAI, were tested with 60 artificial headache scenarios—30 primary and 30 secondary. Each AI was primed to classify the headaches based on SNNOOP10 criteria. The explanations provided by the chatbots were screened for errors and categorized into clinical reasoning, hallucinations, logic, math, misdiagnosis, and misinterpretation.

Results:
Bing Copilot and OpenAI both correctly classified 59 out of 60 cases (98%) using SNNOOP10 criteria. Each system made an error in classifying the primary headache scenarios. Interestingly, the errors occurred in different scenarios. Bing Copilot had 9 explanation errors, while OpenAI had 7.
Conclusions:

Both chatbots demonstrated near-perfect accuracy in classifying headaches using SNNOOP10 criteria, showing particular sensitivity to secondary headaches, which often require urgent evaluation. However, the higher number of explanation errors made by chatbots highlight the need for improvements in clinical reasoning. Despite these errors, ChatGPT shows promise as an initial tool for differentiating secondary from primary headaches, potentially aiding in urgent evaluations for serious conditions.

10.1212/WNL.0000000000208941
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.