2025 American Academy of Neurology Abstract Website

Objective:

This study evaluates ChatGPT-4’s ability to differentiate between primary and secondary headaches using the SNNOOP10 criteria.

Background:

Headaches are a prevalent issue, accounting for 3.5 million annual emergency department (ED) visits in the United States. They are classified into primary and secondary headaches, with secondary headaches potentially indicating serious underlying conditions. The SNNOOP10 mnemonic helps screen for red flags suggesting secondary headaches. Artificial Intelligence (AI) algorithms have shown promise in assisting with patient triage in EDs and primary care settings.

Design/Methods:

Two ChatGPT-4 AI chatbots, Microsoft Bing Copilot and OpenAI, were tested with 60 artificial headache scenarios—30 primary and 30 secondary. Each AI was primed to classify the headaches based on SNNOOP10 criteria. The explanations provided by the chatbots were screened for errors and categorized into clinical reasoning, hallucinations, logic, math, misdiagnosis, and misinterpretation.

Results:

Bing Copilot and OpenAI both correctly classified 59 out of 60 cases (98%) using SNNOOP10 criteria. Each system made an error in classifying the primary headache scenarios. Interestingly, the errors occurred in different scenarios. Bing Copilot had 9 explanation errors, while OpenAI had 7.

Conclusions:

Both chatbots demonstrated near-perfect accuracy in classifying headaches using SNNOOP10 criteria, showing particular sensitivity to secondary headaches, which often require urgent evaluation. However, the higher number of explanation errors made by chatbots highlight the need for improvements in clinical reasoning. Despite these errors, ChatGPT shows promise as an initial tool for differentiating secondary from primary headaches, potentially aiding in urgent evaluations for serious conditions.