A Comparative Study of Shallow and Deep Learning Models for Predicting Post-Operative Complications in Neurosurgical and Clinical Applications with Real-world Example
Ram Saha1, Ahmed Shaheen2, Belal Hamed3, Nour Shaheen2, Ahmed Negida1, Mostafa Eltobgy4
1Virginia Commonwealth University, 2Alexandria Faculty of Medicine, 3Faculty of Medicine Al-azahr Cairo university, 4Department of Neurological surgery Ohio state university surgery Wexner medical center
Objective:
exploring the potential uses of LLM and SML models in predicting post-operative complications in patients with cervical spondylosis, and to compare the pros and cons of the two approaches in terms of accuracy, cost-effectiveness, and patient confidentiality and data security.
Background:

This study compares Shallow Machine Learning (SML) and Large Language Models (LLM) in predicting post-operative complications in neurosurgical applications.

 

Design/Methods:

Data were extracted from the American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) registry. The postoperative outcomes evaluated included infections, cardiorespiratory events, thrombosis, bleeding, readmission, and reoperation. We employed multivariate logistic regression, machine learning algorithms, and nomograms for the analyses.

Results:

A total of 13,287 patients were included, with postoperative complications occurring in 5.4%. The most common complication was infection (2.3%). For predicting any adverse event, the Best AutoML algorithm had the highest performance, achieving an AUC of 0.7989. The RuleFit Model excelled in predicting cardiovascular events (AUC of 0.7688) and infections (AUC of 0.7885).

In terms of LLM models, the Llama 3 8b model had a prediction accuracy of 70% with a training time of 2.5 hours for one epoch. The BioMedLM model reached 60% accuracy for any complication, while the BioMestral model demonstrated 77% accuracy with a training time of 4 hours for 3 epochs.

Conclusions:

SML models are cost-effective and suitable for many clinical application scenarios, unlike LLM models that require high-cost training, maintenance, and engineering. The LLM models still need further training and testing; there is still room for improvement and fine-tuning. Also, further training with larger datasets can significantly improve the results.

Model’s Link: https://huggingface.co/ShaheenLab/DR_SHAHEENAI

10.1212/WNL.0000000000211130
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.