Scalable Extraction of Seizure Frequency Information from Clinical Notes Using the Truveta Language Model
Sean Stern1, Clarence Wade1, Mehraveh Salehi2, Sarah Gilson2, Sumati Ramsinghani2, Sadra Naddaf Shargh2, Saman Zarandioon2, Megan Lipps-Choi2, Esther Kim2, Sally Omidvar2, Louis Ferrari1
1SK Life Science, Inc., 2Truveta, Inc
Objective:

Extract seizure frequency information at scale from clinical notes in order to examine longitudinal outcomes of patients with epilepsy being treated with cenobamate.

Background:
Extracting clinical observations such as seizure frequency information from Electronic Medical Records (EMRs) is extremely challenging yet critical to scaling the way we use real-world data to research outcomes in patients with epilepsy. The Truveta Language Model (TLM) is a specialized mixture of large language models designed to extract structured information from unstructured clinical notes. Clinical concepts and their interrelations within clinical notes are also identified
Design/Methods:
The TLM was trained on data from Truveta’s clinical notes provided by a subset of the 31+ US health systems, which were annotated and reviewed by internal clinical experts. To avoid bias and ensure a comprehensive view, we considered clinical notes from all provider types. The model was evaluated by clinical experts using an independent sample of clinical notes from a different set of patients.
Results:
With EMRs from 2,480 patients receiving cenobamate, we used 17 types of notes (eg progress notes, consults, history/physical) to train the model. Patient timelines were constructed using information about seizure counts, seizure frequency, changes in seizure frequency, and temporality of the seizure mentions. Extracted data identified trends in seizure activity over time, providing a detailed understanding of each patient's seizure condition before and after initiating cenobamate. The model performed with an extraction confidence ratio of 97% high-confidence notes to 3% low-confidence notes, reflecting high precision and recall of the model. 
Conclusions:
The TLM is a powerful model that extracts seizure frequency information with high accuracy and confidence. Researchers can use the extracted data to scale the way that they study epilepsy, for example, examining the effectiveness of antiseizure medications, longitudinal outcomes in patients with epilepsy, impact of various interventions on seizure frequency, and rates of seizure freedom.
10.1212/WNL.0000000000208549
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.