This study aims to construct a Vision transformer-based sleep staging recognition model for achieving objective and accurate sleep structure assessment, advancing epileptic discharge research, improving clinical diagnostic capabilities.
Sleep staging is critical in EEG interpretation, particularly for the detection of interictal epileptiform discharge and seizure, both of which occur more frequently during sleeping. Visual sleep stage suffers from subjectivity and low efficiency—limitations that automated sleep staging techniques can potentially overcome. However, current models are developed using polysomnography (PSG) data, which are unsuitable for standard EEG recordings. There remains a significant gap in automated sleep staging models specifically designed for EEG-based applications.
The sleep staging model vEpiSleepNet consists of four modules: sleep feature extraction, cascaded multi-layer dilated convolution encoding, temporal context encoding, and MLP classification. Nineteen signal channels preserve signal integrity, distinguishing it from previous models employing fewer than 10 channels. This study used Peking Union Medical College Hospital (PUMCH) dataset, containing 152 patients and 140h EEG data with 3763 wake, 3219 N1, 5728 N2, 3253 N3 and 875 rapid eye movement (REM) 30-second segments. Five-fold cross-validation was applied. Several classic EEG networks were compared: DeepSleepNet, SeqSleepNet, AttnSleep, MMASleepNet, and SleepTransformer. The model was further developed and validated on two classic benchmarks to demonstrate its performance consistency.
Experimental results show that vEpiSleepNet achieves state-of-the-art performance, attaining 72.7% accuracy and 0.732 F1-score on PUMCH dataset, surpassing the previous best performance by 9% and 10%, respectively. On two public PSG datasets, vEpiSleepNet also showed superior performance, with 81.9% and 80.1% accuracies, respectively. Ablation experiments demonstrated that dilated convolution encoding significantly improves model performance. Additionally, the number of input channels directly impacts model performance, with increased channel numbers yielding superior results.
This study successfully developed vEpiSleepNet, a novel Vision transformer-based sleep staging model specifically designed for EEG monitoring, demonstrating superior performance across multiple datasets.