1. Develop machine learning models to predict neurocognitive decline in HIV-infected children in Zambia.
2. Compare standard regression-based (SRB) and group-based trajectory modeling (GBTM) techniques for cognitive decline using data-driven approaches.
An estimated 66,000 children in Zambia are HIV-infected. Despite combination antiretroviral therapy (cART), HIV-associated neurocognitive disorders (HAND) remain a significant complication. Data-driven machine learning tools can help elucidate factors that predict cognitive decline and can support clinical interventions for HAND in Zambia.
This is a sub-study of the HIV-Associated Neurocognitive Disorders in Zambia (HANDZ) longitudinal prospective study. Data from 208 perinatally infected HIV+ children and 208 HIV-exposed, uninfected controls over a 2-year period were used to train logistic regression with LASSO regularization (LR), random forests (RF) and support vector machine (SVM) algorithms. Cognitive status was the outcome of interest assessed via a comprehensive neuropsychological testing battery and modelled using SRB and GBTM. Model performance was measured as area under the receiver operating characteristic curve (AUC-ROC).
With SRB modeling, LR performed the best (AUC = 0.795) on average followed by SVM (AUC = 0.790) and RF (AUC = 0.755) while with GBTM, LR performed the best (AUC = 0.743) followed by RF (AUC = 0.709) and SVM (AUC = 0.673). There were no statistically significant differences in performance (p > 0.05) between the two modeling techniques. The addition of HIV-specific variables along with non-specific features improved model performance. Worst recorded WHO stage, CD4 counts, and nadir CD4 counts were the most predictive HIV-specific factors while school performance, height and weight percentiles, grade, history of stunting and socioeconomic status index were the most predictive non-HIV specific variables.