2026 American Academy of Neurology Abstract Website

Objective:

In our poster we evaluate the ability of GPT-4o to determine stroke etiology of 256 cases of physionet- MIMIC-IV database.

Background:

Accurate determination of stroke etiology is essential for guiding secondary prevention and reducing recurrence risk. Recent advances in artificial intelligence, particularly large language models such as GPT-4o, have demonstrated emerging diagnostic capabilities across medical specialties. However, their potential role in vascular neurology and stroke classification remains largely unexplored.

Design/Methods:

A retrospective dataset of 256 patients with coded cerebrovascular accident (CVA) diagnoses was extracted from the MIMIC-IV database. For each case, structured variables and clinical text (hpi, laboratory data, imaging reports, and laboratory data) were provided to GPT-4o Agents in standardized prompts. The model was instructed to determine stroke etiology categories aligned with the 10 Granular stroke Etiologies. Model-generated outputs were compared to reference chart diagnoses using human adjudication by neurologists. Performance metrics included overall accuracy, precision, recall, and F1-score across etiologic subtypes.

Results:

GPT-4o achieved moderate diagnostic accuracy in classifying stroke etiology from de-identified records. Accuracy was highest for cardioembolic and small-vessel strokes, with lower precision in mixed or undetermined etiologies.

Cardioembolic stroke demonstrated the highest sensitivity (0.96) and F1 score (0.85), followed by lacunar stroke with sensitivity of 0.92 (95%) and F1 score of 0.82 (95%). Specificity was consistently high across all etiologies (>0.84), with cardioembolic and severe ECAD achieving 0.79 and 0.96, respectively.

Moderate performance was observed for arterial dissection (F1: 0.70, sensitivity: 0.93) and EVAS (F1: 0.69, sensitivity: 0.79).

Conclusions:

Current performance of GPT4o indicates potential as a decision-support adjunct, limitations include reliance on retrospective data and lack of prospective validation. Our study also emphasizes how Agentic LLMs can be incorporated in Medical workflows. With appropriate guardrails and validation, LLMs like GPT-4o may assist clinicians in standardizing stroke classification, accelerating research workflows, and supporting clinical decision-making in vascular neurology.