This study compares Shallow Machine Learning (SML) and Large Language Models (LLM) in predicting post-operative complications in neurosurgical applications.
Data were extracted from the American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) registry. The postoperative outcomes evaluated included infections, cardiorespiratory events, thrombosis, bleeding, readmission, and reoperation. We employed multivariate logistic regression, machine learning algorithms, and nomograms for the analyses.
A total of 13,287 patients were included, with postoperative complications occurring in 5.4%. The most common complication was infection (2.3%). For predicting any adverse event, the Best AutoML algorithm had the highest performance, achieving an AUC of 0.7989. The RuleFit Model excelled in predicting cardiovascular events (AUC of 0.7688) and infections (AUC of 0.7885).
In terms of LLM models, the Llama 3 8b model had a prediction accuracy of 70% with a training time of 2.5 hours for one epoch. The BioMedLM model reached 60% accuracy for any complication, while the BioMestral model demonstrated 77% accuracy with a training time of 4 hours for 3 epochs.
SML models are cost-effective and suitable for many clinical application scenarios, unlike LLM models that require high-cost training, maintenance, and engineering. The LLM models still need further training and testing; there is still room for improvement and fine-tuning. Also, further training with larger datasets can significantly improve the results.
Model’s Link: https://huggingface.co/ShaheenLab/DR_SHAHEENAI