TY - JOUR
T1 - Evaluation of data imbalance algorithms on the prediction of credit card fraud
AU - Otoo, Godlove
AU - Appati, Justice Kwame
AU - Yaokumah, Winfred
AU - Soli, Michael Agbo Tettey
AU - Nwolley, Stephane Jnr
AU - Ludu, Julius Yaw
N1 - Publisher Copyright:
Copyright © 2021, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
PY - 2021/10/1
Y1 - 2021/10/1
N2 - Credit card fraud has been on the rise for some years now after the introduction of card payment systems. To curb this menace, computational methods have been proposed. Unfortunately, the data available for such a study is highly skewed resulting in the data imbalance problem. In this study, the authors investigate the performance of some selected data imbalance algorithms employed in the prediction of credit card fraud. A dataset from Kaggle containing 284,315 genuine transactions and 492 fraudulent transactions was used for the evaluation. The machine learning algorithms deployed for the study is logistic regression, naïve bayes, and the k-nearest neighbour algorithm with F1 score and precision-recall area under the curve (PR AUC) as the metric. Numerical assessment of the performance of the adopted algorithm gave a rate of 82.5% and 81%, respectively, using neighbourhood cleaning rule for undersampling.
AB - Credit card fraud has been on the rise for some years now after the introduction of card payment systems. To curb this menace, computational methods have been proposed. Unfortunately, the data available for such a study is highly skewed resulting in the data imbalance problem. In this study, the authors investigate the performance of some selected data imbalance algorithms employed in the prediction of credit card fraud. A dataset from Kaggle containing 284,315 genuine transactions and 492 fraudulent transactions was used for the evaluation. The machine learning algorithms deployed for the study is logistic regression, naïve bayes, and the k-nearest neighbour algorithm with F1 score and precision-recall area under the curve (PR AUC) as the metric. Numerical assessment of the performance of the adopted algorithm gave a rate of 82.5% and 81%, respectively, using neighbourhood cleaning rule for undersampling.
KW - Credit Card
KW - Fraud Data
KW - Logistic Regression
KW - Machine Learning
KW - Resampling
UR - http://www.scopus.com/inward/record.url?scp=85118294724&partnerID=8YFLogxK
U2 - 10.4018/IJIIT.289967
DO - 10.4018/IJIIT.289967
M3 - Article
AN - SCOPUS:85118294724
SN - 1548-3657
VL - 17
JO - International Journal of Intelligent Information Technologies
JF - International Journal of Intelligent Information Technologies
IS - 4
ER -