Investigating the use of exemplary data for software vulnerability prediction

Patrick Kwaku Kudjo, Solomon Mensah, Ebenezer Owusu, Justice Kwame Appati

Research output: Contribution to journalArticlepeer-review

Abstract

Vulnerability prediction models (VPMs) are statistical machine learning algorithms that are trained to identify vulnerable components in large software systems. Recently, a wide range of software metrics, like the number of dependencies and the size of code between modules, have been evaluated as potential indicators (i.e., features) for building VPMs. Notwithstanding the success achieved by these approaches, none of these models has performed better in vulnerability prediction. This study aims to investigate the use of exemplary data (i.e., Bellwether instances) for vulnerability prediction. Thus, this study explores the impact of Bellwether on VPMs. Specifically, we use n-grams to identify features of vulnerable Java code for improved prediction accuracy. We evaluate our approach on ten Java Android applications extracted from the F-Droid repository. Six machine learning algorithms are used, and the prediction results are evaluated in terms of precision, recall, F-measure, ROC-AUC, and Yuen’s statistical test. The finding indicates that the Bellwether method outperformed the growing portfolio with F-measure values ranging from 18.5 to 94.4% across the studied datasets, respectively. We found that the Decision tree emerged as the best model (AUC value of 0.81) compared with the other classifiers when trained with Bellwether instances. Hence, we recommend the application of Bellwether instances when setting up VPMs.

Original languageEnglish
JournalInternational Journal of System Assurance Engineering and Management
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Bellwether method
  • Classification
  • Exemplary data
  • N-gram
  • Vulnerability prediction

Fingerprint

Dive into the research topics of 'Investigating the use of exemplary data for software vulnerability prediction'. Together they form a unique fingerprint.

Cite this