An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection

Jinfu Chen, Patrick Kwaku Kudjo, Solomon Mensah, Selasie Aformaley Brown, George Akorfu

Research output: Contribution to journalArticlepeer-review

29 Citations (Scopus)

Abstract

Vulnerability classification is an important activity in software development and software quality maintenance. A typical vulnerability classification model usually involves a stage of term selection, in which the relevant terms are identified via feature selection. It also involves a stage of term-weighting, in which the document weights for the selected terms are computed, and a stage for classifier learning. Generally, the term frequency-inverse document frequency (TF-IDF) model is the most widely used term-weighting metric for vulnerability classification. However, several issues hinder the effectiveness of the TF-IDF model for document classification. To address this problem, we propose and evaluate a general framework for vulnerability severity classification using the term frequency-inverse gravity moment (TF-IGM). Specifically, we extensively compare the term frequency-inverse gravity moment, term frequency-inverse document frequency, and information gain feature selection using five machine learning algorithms on ten vulnerable software applications containing a total number of 27,248 security vulnerabilities. The experimental result shows that: (i) the TF-IGM model is a promising term weighting metric for vulnerability classification compared to the classical term-weighting metric, (ii) the effectiveness of feature selection on vulnerability classification varies significantly across the studied datasets and (iii) feature selection improves vulnerability classification.

Original languageEnglish
Article number110616
JournalJournal of Systems and Software
Volume167
DOIs
Publication statusPublished - Sep 2020

Keywords

  • Classification
  • Feature selection
  • Machine learning algorithms
  • Severity
  • Software vulnerability
  • Term-weighting

Fingerprint

Dive into the research topics of 'An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection'. Together they form a unique fingerprint.

Cite this