Evaluating Human Pose Estimation Models in Hospitalized Patients
Justin Min1, Erika Juarez Martinez1, Jeremy Eagles1, Stephan Schuele2, Eyal Kimchi1
1Department of Neurology, Northwestern University, Feinberg School of Medicine, 2Department of Neurology, Comprehensive Epilepsy Center, Northwestern Memorial Hospital
Objective:

Evaluate the performance of pose estimation models in hospitalized patients undergoing video EEG. 

Background:

Monitoring activity in patients hospitalized on neurological services can be crucial for timely detection of conditions such as falls, delirium, and epilepsy, but remains labor-intensive. Continuous video-based monitoring using pose estimation models offers a scalable alternative to optimize clinical assessments. Model performance in clinical populations, however, where variable postures and complex environments pose unique challenges, remains unexplored. 

Design/Methods:

Three state-of-the-art pre-trained human pose estimation models, YOLOv11, OpenPose, and MMPose, were evaluated in hospitalized patients undergoing video-EEG. Each model generated coordinates and confidence scores for facial and body keypoints. From each video, ten frames were randomly selected and manually annotated (17 common keypoints) as ground truth and used to define model-specific confidence thresholds. Sensitivity, specificity, balanced accuracy, and ROC curves were computed. Keypoint location accuracy was calculated using Euclidean distance between ground truth and model detections.  

Results:

We analyzed 46 videos from 24 hospitalized patients with diverse racial backgrounds. OpenPose showed the highest AUC in 52% of the cases (24/46 videos) and highest overall balanced accuracy (BA) of 72% (53% sensitivity and 89% specificity). For specific body regions, OpenPose performed better for facial detection (BA 77%, 64% sensitivity, 90% specificity) while MMPose performed better for upper body (BA 74%, 62% sensitivity and 85% specificity) and lower body detection (BA 66%, 48% sensitivity and 82% specificity). Distance errors were smaller for facial keypoints than for torso and limbs. 

Conclusions:

State-of-the-art computer vision models face challenges when applied to clinical videos. Models were more accurate for facial keypoints, and least accurate for lower body keypoints. Enhancing automated computer vision performance for inpatients in neurological services may require custom models based on clinical datasets or development of novel, potentially combined, model approaches. 

10.1212/WNL.0000000000216986
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.