TY - JOUR
T1 - Addressing Class Imbalance Problem in Health Data Classification
T2 - Practical Application From an Oversampling Viewpoint
AU - Agyemang, Edmund Fosu
AU - Mensah, Joseph Agyapong
AU - Nyarko, Eric
AU - Arku, Dennis
AU - Mbeah-Baiden, Benedict
AU - Opoku, Enock
AU - Noye Nortey, Ezekiel Nii
N1 - Publisher Copyright:
Copyright © 2025 Edmund Fosu Agyemang et al. Applied Computational Intelligence and Soft Computing published by John Wiley & Sons Ltd.
PY - 2025
Y1 - 2025
N2 - While analyzing health data is important for improving health outcomes, class imbalance in datasets poses major challenges to machine learning classification models. This work, therefore, considers the class imbalance problem in stroke prediction using models such as K-nearest neighbors, support vector machine, logistic regression, random forest, and decision tree. This work balances the stroke dataset, thereby enhancing model performance, through various oversampling strategies: random oversampling (RO), ADASYN, SMOTE, and SMOTE–Tomek. Compared to the results of the imbalanced dataset, all applied oversampling techniques enhanced the correct classification of stroke events by the ML model. Among these, RO–SVM with RBF kernel was the best in terms of sensitivity, specificity, G-mean, F1-score, and accuracy values, offering the highest results with respective values of 89.87%, 94.91%, 92.36%, 89.64%, and 89.87%. After applying oversampling techniques, all the machine learning classifications were good enough to classify stroke status, especially for the minority class. This study has highlighted the importance of class imbalance issues in health datasets. Precise detection of instances of minority classes can be enhanced considerably by employing classification models with the implementation of hybrid strategies to effectively solve class imbalance issues, which, in turn, will help improve healthcare outcomes. Further research in integrating more advanced deep learning techniques into other health datasets with imbalances is encouraged to further validate or refine class imbalance approaches, as effective handling of imbalanced classes can substantially promote predictive model performance in the analysis of healthcare.
AB - While analyzing health data is important for improving health outcomes, class imbalance in datasets poses major challenges to machine learning classification models. This work, therefore, considers the class imbalance problem in stroke prediction using models such as K-nearest neighbors, support vector machine, logistic regression, random forest, and decision tree. This work balances the stroke dataset, thereby enhancing model performance, through various oversampling strategies: random oversampling (RO), ADASYN, SMOTE, and SMOTE–Tomek. Compared to the results of the imbalanced dataset, all applied oversampling techniques enhanced the correct classification of stroke events by the ML model. Among these, RO–SVM with RBF kernel was the best in terms of sensitivity, specificity, G-mean, F1-score, and accuracy values, offering the highest results with respective values of 89.87%, 94.91%, 92.36%, 89.64%, and 89.87%. After applying oversampling techniques, all the machine learning classifications were good enough to classify stroke status, especially for the minority class. This study has highlighted the importance of class imbalance issues in health datasets. Precise detection of instances of minority classes can be enhanced considerably by employing classification models with the implementation of hybrid strategies to effectively solve class imbalance issues, which, in turn, will help improve healthcare outcomes. Further research in integrating more advanced deep learning techniques into other health datasets with imbalances is encouraged to further validate or refine class imbalance approaches, as effective handling of imbalanced classes can substantially promote predictive model performance in the analysis of healthcare.
KW - class imbalance
KW - health data classification
KW - oversampling techniques
KW - stroke prediction
UR - https://www.scopus.com/pages/publications/105000751822
U2 - 10.1155/acis/1013769
DO - 10.1155/acis/1013769
M3 - Article
AN - SCOPUS:105000751822
SN - 1687-9724
VL - 2025
JO - Applied Computational Intelligence and Soft Computing
JF - Applied Computational Intelligence and Soft Computing
IS - 1
M1 - 1013769
ER -