2026 American Academy of Neurology Abstract Website

Objective:

To evaluate a Chain-of-thought Outcome Prediction Engine (COPE)—a reasoning-enhanced LLM framework—for predicting acute ischemic stroke (AIS) outcomes, and to compare its performance with traditional ML models, Clinical BERT, and GPT-4.1.

Background:

Predicting patient outcomes is central to personalized care. While clinical notes such as discharge summaries contain rich context, their unstructured format limits use in traditional models. Advances in large language models (LLMs) offer new ways to harness this data.

Design/Methods:

We analyzed 464 AIS patients from Stanford University Hospital (2010–2023) with discharge summaries and 90-day modified Rankin Scale (mRS, 0–6) outcomes. COPE uses a two-step Chain-of-Thought (CoT) framework with sequential LLaMA-3–8B models: one generates reasoning, and the other predicts the mRS score. Performance was compared with Clinical BERT, a variable-based support vector machine (Clinical SVM), and GPT-4.1. Ablation studies assessed the impact of the reasoning component and individual discharge-summary sections. Model performance was evaluated by mean absolute error (MAE), percentage within 1 mRS point (±1 ACC), and exact accuracy (ACC).

Results:

COPE achieved an MAE of 1.00 (95% CI: 0.91–1.08), ±1 ACC of 75% (71–79%), and exact ACC of 33% (29–38%), matching GPT-4.1 [MAE: 1.00 (0.91–1.08), ±1 ACC: 78% (74–82), ACC: 32% (27–36); p = 1.00, 0.17, 0.62] and outperforming Clinical BERT [MAE: 1.28 (1.17–1.38), ±1 ACC: 62% (58–67), ACC: 28% (24–32); p < 0.001, p < 0.001, p = 0.05] and Clinical SVM [MAE: 1.28 (1.18–1.38), ±1 ACC: 61% (56–66), ACC: 27% (23–31); p < 0.001, p < 0.001, p = 0.03]. It also surpassed its non-reasoning variant [MAE: 1.28 (1.19–1.38), ±1 ACC: 64% (60–69%), ACC: 23% (19–28%)]. Text ablation showed the largest drop when Medications and Discharge & Follow-up Summary sections were removed.

Conclusions:

COPE, a reasoning-enhanced dual-LLM framework, matched GPT-4.1 and outperformed traditional models, offering an accurate, interpretable, privacy-preserving approach to stroke outcome prediction from unstructured text.