An Optimal Spacing Approach for Sampling Small-sized Datasets for Software Effort Estimation

Samuel Abedu, Solomon Mensah, Frederick Boafo, Eva Bushel, Elizabeth Akuafum

Research output: Contribution to journalConference articlepeer-review

Abstract

Context: There has been a growing research focus in conventional machine learning techniques for software effort estimation (SEE). However, there is a limited number of studies that seek to assess the performance of deep learning approaches in SEE. This is because the sizes of SEE datasets are relatively small. Purpose: This study seeks to define a threshold for small-sized datasets in SEE, and investigates the performance of selected conventional machine learning and deep learning models on small-sized datasets. Method: Plausible SEE datasets with their number of project instances and features are extracted from existing literature and ranked. Eubank’s optimal spacing theory is used to discretize the ranking of the project instances into three classes (small, medium and large). Five conventional machine learning models and two deep learning models are trained on each dataset classified as small-sized using the leave-one-out cross-validation. The mean absolute error is used to assess the prediction performance of each model. Result: Findings from the study contradicts existing knowledge by demonstrating that deep learning models provide improved prediction performance as compared to the conventional machine learning models on small-sized datasets. Conclusion: Deep learning can be adopted for SEE with the application of regularisation techniques.

Original languageEnglish
Pages (from-to)462-467
Number of pages6
JournalProceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE
Volume2023-July
DOIs
Publication statusPublished - 2023
Event35th International Conference on Software Engineering and Knowledge Engineering, SEKE 2023 - Hybrid, San Francisco
Duration: 1 Jul 202310 Jul 2023

Keywords

  • Conventional Machine learning
  • Deep learning
  • Optimal spacing theory
  • Small-sized
  • Software effort estimation

Fingerprint

Dive into the research topics of 'An Optimal Spacing Approach for Sampling Small-sized Datasets for Software Effort Estimation'. Together they form a unique fingerprint.

Cite this