TY - JOUR
T1 - EBOLApred
T2 - A machine learning-based web application for predicting cell entry inhibitors of the Ebola virus
AU - Adams, Joseph
AU - Agyenkwa-Mawuli, Kwasi
AU - Agyapong, Odame
AU - Wilson, Michael D.
AU - Kwofie, Samuel K.
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/12
Y1 - 2022/12
N2 - Ebola virus disease (EVD) is a highly virulent and often lethal illness that affects humans through contact with the body fluid of infected persons. Glycoprotein and matrix protein VP40 play essential roles in the virus life cycle within the host. Whilst glycoprotein mediates the entry and fusion of the virus with the host cell membrane, VP40 is also responsible for viral particle assembly and budding. This study aimed at developing machine learning models to predict small molecules as possible anti-Ebola virus compounds capable of inhibiting the activities of GP and VP40 using Ebola virus (EBOV) cell entry inhibitors from the PubChem database as training data. Predictive models were developed using five algorithms comprising random forest (RF), support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), and logistic regression (LR). The models were evaluated using a 10-fold cross-validation technique and the algorithm with the best performance was the random forest model with an accuracy of 89 %, an F1 score of 0.9, and a receiver operating characteristic curve (ROC curve) showing the area under the curve (AUC) score of 0.95. LR and SVM models also showed plausible performances with overall accuracy values of 0.84 and 0.86, respectively. The models, RF, LR, and SVM were deployed as a web server known as EBOLApred accessible via http://197.255.126.13:8000/.
AB - Ebola virus disease (EVD) is a highly virulent and often lethal illness that affects humans through contact with the body fluid of infected persons. Glycoprotein and matrix protein VP40 play essential roles in the virus life cycle within the host. Whilst glycoprotein mediates the entry and fusion of the virus with the host cell membrane, VP40 is also responsible for viral particle assembly and budding. This study aimed at developing machine learning models to predict small molecules as possible anti-Ebola virus compounds capable of inhibiting the activities of GP and VP40 using Ebola virus (EBOV) cell entry inhibitors from the PubChem database as training data. Predictive models were developed using five algorithms comprising random forest (RF), support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), and logistic regression (LR). The models were evaluated using a 10-fold cross-validation technique and the algorithm with the best performance was the random forest model with an accuracy of 89 %, an F1 score of 0.9, and a receiver operating characteristic curve (ROC curve) showing the area under the curve (AUC) score of 0.95. LR and SVM models also showed plausible performances with overall accuracy values of 0.84 and 0.86, respectively. The models, RF, LR, and SVM were deployed as a web server known as EBOLApred accessible via http://197.255.126.13:8000/.
KW - Ebola virus protein
KW - Inhibitors
KW - Logistic regression
KW - Machine learning
KW - Random forest
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85137274982&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2022.107766
DO - 10.1016/j.compbiolchem.2022.107766
M3 - Article
C2 - 36088668
AN - SCOPUS:85137274982
SN - 1476-9271
VL - 101
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
M1 - 107766
ER -