EEG-VL: Integrating Visual Features with Large Language Models for Automated Seizure Detection
Nan Lin1, Zi Liang2, Qiang Lu1, Weifang Gao1, Heyang Sun1, Peng Hu2, Lian Li2
1Peking Union Medical College Hospital, 2NetEase Media Technology Co.
Objective:

To address the increasing demand for accurate EEG-based epileptic seizure detection, we propose EEG-VL, a novel vision–language framework that treats EEG signals as visual patterns and integrates them with large language models to improve seizure detection.

Background:
Traditional approaches often struggle to capture the complex spatiotemporal patterns inherent in EEG signals and typically lack high-level contextual understanding, limiting their applicability in real-world clinical settings. Recent advances in large language models (LLMs) have enabled new opportunities for semantic reasoning in biomedical tasks, offering a promising strategy for more sophisticated and semantically informed methodologies. 
Design/Methods:

A pretrained EfficientNet encoder is used to extract abstract visual features from EEG representations, which are embedded into structured prompts and processed by the Qwen language model. This design synergistically combines the spatial modeling capabilities of convolutional networks with the semantic reasoning strengths of large language models. We conduct experiments on two widely used benchmark datasets for seizure detection, the TUH EEG Seizure Corpus (TUSZ) and the CHB-MIT Scalp EEG Database. To address the class imbalance commonly present in seizure datasets, we adopt a logit adjustment strategy based on label distribution priors.

Results:

We employed several classic EEG networks for performance comparisons, including EEGNet, EEG-TCNet, ShallowCNN-Tran, EEGNet-Tran and EEG2ViT. Extensive experiments on the TUSZ and CHB-MIT datasets demonstrate that EEG-VL achieves state-of-the-art performance. On TUSZ, our model attains an AUROC of 0.9466 and an AUPRC of 0.7599, surpassing previous best results by 0.80% and 8.19%, respectively. On CHB-MIT, EEG-VL achieves average AUROC of 0.91 and AUPRC of 0.47, superior previous best performance by 1% and 14.6%.

Conclusions:

Our study underscores the potential of the proposed vision–language paradigm for robust, scalable, and clinically applicable EEG-based seizure detection. By integrating a pretrained visual encoder with a large language model, EEG-VL exhibits excellent performance on multiple public datasets, demonstrating promising potential for clinical adoption.

10.1212/WNL.0000000000215336
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.