Machine-learning Assisted Swallowing Assessment: A Deep Learning-based Quality Improvement Tool to Screen for Post-stroke Dysphagia
Arjun Balachandar1, Rami Saab1, Hamza Mahdi1, Eptehal Nashnoush1, Houman Khosravani1
1Sunnybrook Health Sciences Centre, University of Toronto
Objective:
To develop a proof-of-concept machine learning classifier based on voice analysis to screen for post-stroke dysphagia, thereby decreasing screening subjectivity and potentially improving access to screening by bed-side providers.
Background:
Post-stroke dysphagia is common and associated with significant morbidity and mortality. Existing tools to assess for dysphagia include the gold standard (Barium swallow test) or screening tools administered by trained professionals. Each approach presents draw-backs including costs, human-resource requirements and subjectivity. Patients often must wait for swallowing assessments and are prohibited from intaking food orally, negatively impacting their quality of life and outcomes. In this study, we examine the application of convolutional neural networks (CNNs) to rapidly classify patient swallowing status using voice samples alone.
Design/Methods:
Vocal samples from 68 post-stroke patients on the neurovascular ward at Sunnybrook Hospital (Toronto, Canada) were studied (average age 68±16), with 40 in training and 28 in testing-cohorts. Samples consisted of vowel sounds and speech components of the National Institute of Health (NIH) Stroke Scale. Patients were labeled according to dysphagia screening status (Toronto Bedside Swallowing Screening Test). Individual vocal samples were then segmented into 1,579 audio clips (0.5-sec clips, 50% overlap) and converted into 6,655 Mel-spectrograms (224x224-pixel images) which were used to train two convolutional neural networks (DenseNet and ConvNext) separately and in ensemble.
Results:
Clip-level and patient-level swallowing-status predictions were obtained through an unweighted averaging ensemble method. The ensemble network demonstrated an F1-score of 0.81 and area under the receiver operating characteristic curve of 0.912 with a sensitivity and specificity of 0.89 and 0.79 respectively.
Conclusions:
Our study demonstrates the feasibility and effectiveness of applying state-of-the-art CNNs to classify Mel-spectrogram images of vocalizations for the detection of post-stroke dysphagia. This study is relevant to healthcare professionals caring for stroke patients and may offer an avenue for developing rapid, non-invasive, and more objective dysphagia screening tools.