Large Language Models in Epilepsy Care: A Systematic Review of Current Applications
Vandana Venkatesh1, Vrindha Prakash2, Reena Alame1, Lisa Shields3, Cemal Karakas4, Zulfi Haneef1
1Baylor College of Medicine, 2University of California, San Diego, 3Norton Healthcare, 4Children's National Hospital
Objective:

To evaluate the current applications and performance of large language models (LLMs) in epilepsy care, including diagnosis, prognosis, and treatment decisions, through a systematic review.

Background:

LLMs like ChatGPT and Gemini have recently emerged as powerful tools in epilepsy care, capable of processing and summarizing text efficiently from unstructured clinical records. However, the scope, performance, and utility of LLMs in epilepsy care have not been systematically assessed.

Design/Methods:

A systematic literature search across PubMed, Embase, and Web of Science identified studies published until September 2025 that applied LLMs to epilepsy care, including diagnosis and treatment guidance. Data collected included model type, task, and performance metrics. Machine learning studies without language processing were excluded.

Results:

From 7,598 unduplicated studies, after excluding animal studies, case reports, and observational studies (n=507) and title screening (n=5,895), 1,196 studies remained. Following abstract screening (n=1,131), 65 studies underwent full-text review. LLMs were applied to clinical text, EEG reports, and imaging, demonstrating variable diagnostic performance. Studies reported variable sensitivity (17.6%-96.9%) and specificity (26.7%-81.4%) for epilepsy diagnosis.  While ChatGPT diagnosed epileptic syndromes and structural etiologies well (90.0% accuracy), it struggled with ambiguous cases like unknown seizure onset (12.5% accuracy). Human experts achieved near-perfect accuracy. LLMs also provided insights into patient concerns and mental health based on conversations in online epilepsy communities. In patient education, Google Gemini and ChatGPT produced accurate guides with similar reliability, although Gemini scored higher on ease of understanding. Both lacked depth and visual aids.

Conclusions:

LLMs demonstrate broad potential to improve epilepsy care by automating analyses of unstructured clinical data and supporting diagnostic, prognostic, and treatment decisions. Current applications include extracting data from clinical notes, diagnosing epilepsy, localizing epileptogenic zones, predicting subtypes, and aiding patient education through AI chatbots. LLM performance and accuracy needs to be improved through targeted training with large, diverse datasets.

10.1212/WNL.0000000000216289
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.