Validating Patient Symptoms in the Electronic Health Record with Large Language Models for Scalable Tracking of Symptoms in Neuro-oncology
John Rhee1, Thomas Sounack2, Zachary Tentor2, Joshua Davis4, Brigitte Durieux4, Paul Miller4, Rameen Beroukhim3, Charlotta Lindvall2
1Neuro-Oncology, Dana Farber Cancer Institute, Harvard Medical School, 2Supportive Oncology, 3Neuro-Oncology, Dana Farber Cancer Institute, 4Dana Farber Cancer Institute
Objective:

To validate symptoms extracted from the electronic health record (EHR) with using large language modeling (LLM) to scale symptom extraction from the EHR.

Background:

Patients with brain tumors report high symptom burden, but there is little symptom research in this patient population. A methodology that scales symptom extraction from the EHR can facilitate better understanding of symptom pathophysiology and burden. Advances in LLMs may provide a means for scalable tracking of patient symptoms.

Design/Methods:

Our dataset comprised of 500 neuro-oncology clinical notes from 2020-2023, labeling the following symptoms: headache, fatigue, nausea, and anxiety. A physician labeled all clinical notes (gold standard) for whether the symptom was present or negated. The physician labeling was compared with a LLM, GPT-4o, assessing whether each symptoms was affirmed or negated. We ran the LLM using 1) the full clinical note and 2) a portioned section of the notes (interval history, assessment and plan). We calculated the performance of the LLM compared to the gold standard using standard metrics.

Results:

Across all symptoms, the average precision was 83.6%, average recall 91.6%, and average F1-score 86.9% (n=424). The LLM performed best with headache, resulting in 92.8% precision, 95.4% recall, and an F1-score of 94.1%, followed by nausea (94.4% precision, 85.0% recall, and an F1-score of 89.5%), fatigue (74.3% precision, 98.7% recall, and an F1-score of 84.8%), and anxiety (73.0% precision, 87.1% recall, and an F1-score of 79.4%). Running the LLM for the sectioned notes cost a third of the amount of full notes ($1.62 vs $5.03 for 500 notes).

Conclusions:

LLMs performed very well for detecting headache, fatigue, nausea, and anxiety. LLMs offer the ability to scale symptom extraction from the chart, which is crucial for better understanding symptom burden and powering symptom-related interventions and studies. Sectioned notes are environmentally-friendly requiring less compute and costs.

10.1212/WNL.0000000000210712
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.