The impact of cluster resolution feature selection on pattern recognition and classification for detecting Sudan dye adulteration in palm oil

Joanna K. Kwao, Cheetham Mingle, John N. Addotey, Kwabena F.M. Opuni, Lawrence A. Adutwum

Research output: Contribution to journalArticlepeer-review

Abstract

This study evaluates the performance of some commonly used chemometric and machine learning techniques such as principal component analysis (PCA), artificial neural network (ANN), k-nearest neighbors (KNN), logistic regression discriminant analysis (LRDA), partial least squares discriminant analysis (PLSDA), support vector machine (SVM), and gradient boosted decision tree (GBDT) on HATR − FTIR data for detecting Sudan dye adulteration in palm oil. We employed the Icoshift for data alignment and Savitzky-Golay smoothing to enhance the data quality. Cluster resolution feature selection (CRFS) selected 2.39 % of 3351 features. Using only the 80 selected features PCA models showed a clear separation between adulterated and pure palm oil samples and an improvement in explained variance which hitherto was not observed. LRDA, PLSDA and SVM showed improved training TPR, ACC and MCC after feature selection. KNN showed improvement all model quality parameters after feature selection.

Original languageEnglish
Article number112433
JournalMicrochemical Journal
Volume208
DOIs
Publication statusPublished - Jan 2025

Keywords

  • Adulteration
  • Chemometrics
  • Feature Selection
  • FTIR
  • Machine Learning
  • Palm Oil
  • Sudan Dyes

Fingerprint

Dive into the research topics of 'The impact of cluster resolution feature selection on pattern recognition and classification for detecting Sudan dye adulteration in palm oil'. Together they form a unique fingerprint.

Cite this