Skip to main navigation Skip to search Skip to main content

Identification of hematological biomarkers and assessment of machine learning models for sickle cell anemia severity classification

  • Francis Abeku Ussher
  • , Edwin Ferguson Laing
  • , Alex Bismark Atta-Owusu
  • , Ernest Kissi Kontor
  • , Nityanand Jain
  • , Sylvester Yao Lokpo
  • , Evans Asamoah Adu
  • , Samuel Ametepe
  • , Ruth Tetteh
  • , Yvonne Dei-Adomakoh
  • , Robert Amadu Ngala
  • Koforidua Technical University
  • Kwame Nkrumah University of Science and Technology
  • University of Cape Coast Ghana
  • University of Health and Allied Sciences
  • Kumasi Centre for Collaborative Research in Tropical Medicine (KCCR)
  • Korle Bu Teaching Hospital

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: Sickle cell anemia (SCA) is a severe form of sickle cell disease (SCD). Given the rising global disease burden and the unpredictable clinical outcomes, there is a need for development of reliable methods to predict disease severity. Methods: Our study involved 481 participants, including 356 SCA patients and 125 healthy controls, who reported at the Korle-Bu Teaching Hospital, Ghana. Using a mixed-methods approach, we performed a biomarker identification analysis followed by assessment of several machine learning (ML) models to predict the severity of SCA. Results: Significant correlations were observed between immune cells, erythrocyte indices, and bilirubin, which highlights the chronic inflammatory state and hemolytic nature of the disease. A principal component analysis (PCA) revealed strong correlations between immune cells and erythrocyte indices with PCA1 and PCA2, indicating a significant influence of immune pathways and erythropoiesis. The all-variable model achieved an area under the receiver operating characteristics curve (AUC-ROC) of 0.98 with a 92.4% predictive accuracy. The model identified direct and total bilirubin, reticulocyte count, hydrogen sulfide, and neutrophil count as the top five biomarkers with the highest average importance (scores >1.2). Further ML assessment for prediction of SCA severity exhibited excellent discriminating performance for the C5.0 decision tree (C5.0), Random Forest (RF), XG boost (XGB), and bagged trees (TREEBAG) models, with AUCROC ≥80% and area under the precision recall curve (AUC-PR) ≥85%. Conclusions: We identified key biomarkers associated with immune response, erythropoiesis, and oxidative stress that could serve as surrogate endpoints in clinical trials.

Original languageEnglish
Article numberyoaf020
JournalJournal of Sickle Cell Disease
Volume2
Issue number1
DOIs
Publication statusPublished - 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • blood biomarkers
  • disease severity
  • machine learning
  • prediction
  • sickle cell disease

Fingerprint

Dive into the research topics of 'Identification of hematological biomarkers and assessment of machine learning models for sickle cell anemia severity classification'. Together they form a unique fingerprint.

Cite this