An Empirical Study on Small-Sized Datasets Based on Eubank’s Optimal Spacing Theorem

Samuel Abedu, Solomon Mensah, Frederick Boafo

Research output: Contribution to journalArticlepeer-review

Abstract

Conventional machine learning methods for software effort estimation (SEE) have seen an increase in research interest. Conversely, there are few research that try to evaluate how well deep learning techniques work in SEE. This can be attributed to the relatively small sizes of SEE datasets. The goal of the study is to establish a threshold for small-sized datasets in SEE. Additionally, it looks into how well certain deep learning and traditional machine learning models perform on small-sized datasets. From the body of existing literature, plausible SEE datasets are extracted and ranked along with their attributes and number of project cases. The ranking of the project instances is discretized into three classes (small, medium, and large) using Eubank’s optimal spacing theory. Using the leave-one-out cross-validation, each small-sized dataset is used to train two deep learning models and five conventional machine learning models. Each model’s ability to make predictions is evaluated using its mean absolute error. Results show that, on small-scale datasets, deep learning models outperform traditional machine learning models in terms of prediction accuracy, which contradicts what is previously known. Regularisation techniques can be used in conjunction with deep learning to address SEE.

Original languageEnglish
Article number1
JournalSN Computer Science
Volume6
Issue number1
DOIs
Publication statusPublished - Jan 2025

Keywords

  • Deep learning
  • Eubank’s optimal spacing theory
  • Small-sized
  • Software effort estimation
  • Traditional Machine learning

Fingerprint

Dive into the research topics of 'An Empirical Study on Small-Sized Datasets Based on Eubank’s Optimal Spacing Theorem'. Together they form a unique fingerprint.

Cite this