Predictive Models for Neurocognitive Decline in HIV+ Youth in Zambia Using a Machine Learning Approach
Mohammed Mehdi Shahid1, Gauri Patil2, Esau Mbewe3, Pelekelo Kabundula3, Sylvia Mwanza-Kabaghe3, Alexandra Buda4, Ruth Agwaze1, Heather Adams5, Milimo Mweemba3, Gretchen Birbeck6, David Bearden1
1University of Rochester School of Medicine, 2Ichan School of Medicine at Mount Sinai, 3University of Zambia, 4Baylor College of Medicine, 5University of Rochester, 6University of Rochester/CHET
Objective:

1. Develop machine learning models to predict neurocognitive decline in HIV-infected children in Zambia.

2. Compare standard regression-based (SRB) and group-based trajectory modeling (GBTM) techniques for cognitive decline using data-driven approaches.

Background:

An estimated 66,000 children in Zambia are HIV-infected. Despite combination antiretroviral therapy (cART), HIV-associated neurocognitive disorders (HAND) remain a significant complication. Data-driven machine learning tools can help elucidate factors that predict cognitive decline and can support clinical interventions for HAND in Zambia.

Design/Methods:

This is a sub-study of the HIV-Associated Neurocognitive Disorders in Zambia (HANDZ) longitudinal prospective study. Data from 208 perinatally infected HIV+ children and 208 HIV-exposed, uninfected controls over a 2-year period were used to train logistic regression with LASSO regularization (LR), random forests (RF) and support vector machine (SVM) algorithms. Cognitive status was the outcome of interest assessed via a comprehensive neuropsychological testing battery and modelled using SRB and GBTM. Model performance was measured as area under the receiver operating characteristic curve (AUC-ROC).

Results:

With SRB modeling, LR performed the best (AUC = 0.795) on average followed by SVM (AUC = 0.790) and RF (AUC = 0.755) while with GBTM, LR performed the best (AUC = 0.743) followed by RF (AUC = 0.709) and SVM (AUC = 0.673). There were no statistically significant differences in performance (p > 0.05) between the two modeling techniques. The addition of HIV-specific variables along with non-specific features improved model performance. Worst recorded WHO stage, CD4 counts, and nadir CD4 counts were the most predictive HIV-specific factors while school performance, height and weight percentiles, grade, history of stunting and socioeconomic status index were the most predictive non-HIV specific variables.

Conclusions:
Machine learning can help elucidate factors that predict neurocognitive decline in HIV+ vs. HEU youth in Zambia. Incorporating imaging and inflammatory biomarkers might improve these models.
10.1212/WNL.0000000000205519