Neuro-Copilot AI: Advanced LLM Framework for Neurological Patients in the Emergency Room
Alon Gorenshtein1, Shiri Fistel1, Moran Sorka2, Gregory Telman1, Raz Winer1, Shlomi Peretz4, Dvir Aran3, Shahar Shelly1
1Department of Neurology, Rambam Medical Center, 2AI in neurology Laboratory, 3Faculty of Biology,The Taub Faculty of Computer Science, Technion – Israel Institute of Technology, 4Department of Neurology, Shamir Medical Center
Objective:
Generating language based neurological framework to identify neurologically high-risk patients in acute care settings.
Background:
Neurological decisions in the ED impacts healthcare delivery, increases morbidity and imposes economic burden due to inefficient resource utilization and increased treatment expenses. The increase prevalence of neurological disorders as well as the shortage of neurologists accentuate the need for a solution.
Design/Methods:
We developed large language model (LLM) framework augmented by prompt engineering, retrieval-augmented generation (RAG) based on the historical cases, XGBoost and logistic regression. We then tested the model using consecutive patients who received neurological consultations in the ED (n=1368). Primary endpoints were admission and mortality. Results were also tested with blinded senior neurologists unaware to the actual decision and the LLM outputs.
Results:
We included 1368 patients in 2 months period, median age of 58.6 [38.37-74.5], 48.46% were males, 45.68% admitted. There was no significant demographical or racial bias towards admission or discharge, we noted higher rate of night shift consultations among admitted patients (36% vs. 18.7%, p < 0.001). The Neuro-Copilot AI framework achieved an AUC of 0.91 for predicting general admission, an AUC of 0.92 for predicting admission to neurological department, an AUC of 0.92 for long-term mortality risk, and 0.96 for 48-hour mortality risk. We used 3 blinded experts for validation. The Fleiss' kappa for admission was only 0.21, reflecting the inherent subjectivity in clinical admission decisions. However, Neuro-Copilot AI predictions showed a strong correlation with the average expert score (Pearson correlation 0.79, p < 0.001)
Conclusions:
We demonstrate LLM based classifier effectively identifies high-risk patients and provides highly accurate predictions of 48-hour mortality. This represents a crucial first step towards the integration of such models into the fast-paced environment of emergency departments.
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.