TY - JOUR
T1 - Duplex output software effort estimation model with self-guided interpretation
AU - Mensah, Solomon
AU - Keung, Jacky
AU - Bosu, Michael Franklin
AU - Bennin, Kwabena Ebo
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2018/2
Y1 - 2018/2
N2 - Context Software effort estimation (SEE) plays a key role in predicting the effort needed to complete software development task. However, the conclusion instability across learners has affected the implementation of SEE models. This instability can be attributed to the lack of an effort classification benchmark that software researchers and practitioners can use to facilitate and interpret prediction results. Objective To ameliorate the conclusion instability challenge by introducing a classification and self-guided interpretation scheme for SEE. Method We first used the density quantile function to discretise the effort recorded in 14 datasets into three classes (high, low and moderate) and built regression models for these datasets. The results of the regression models were an effort estimate, termed output 1, which was then classified into an effort class, termed output 2. We refer to the models generated in this study as duplex output models as they return two outputs. The introduced duplex output models trained with the leave-one-out cross validation and evaluated with MAE, BMMRE and adjusted R2, can be used to predict both the software effort and the class of software effort estimate. Robust statistical tests (Welch's t-test and Kruskal-Wallis H-test) were used to examine the statistical significant differences in the models’ prediction performances. Results We observed the following: (1) the duplex output models not only predicted the effort estimates, they also offered a guide to interpreting the effort expended; (2) incorporating the genetic search algorithm into the duplex output model allowed the sampling of relevant features for improved prediction accuracy; and (3) ElasticNet, a hybrid regression, provided superior prediction accuracy over the ATLM, the state-of-the-art baseline regression. Conclusion The results show that the duplex output model provides a self-guided benchmark for interpreting estimated software effort. ElasticNet can also serve as a baseline model for SEE.
AB - Context Software effort estimation (SEE) plays a key role in predicting the effort needed to complete software development task. However, the conclusion instability across learners has affected the implementation of SEE models. This instability can be attributed to the lack of an effort classification benchmark that software researchers and practitioners can use to facilitate and interpret prediction results. Objective To ameliorate the conclusion instability challenge by introducing a classification and self-guided interpretation scheme for SEE. Method We first used the density quantile function to discretise the effort recorded in 14 datasets into three classes (high, low and moderate) and built regression models for these datasets. The results of the regression models were an effort estimate, termed output 1, which was then classified into an effort class, termed output 2. We refer to the models generated in this study as duplex output models as they return two outputs. The introduced duplex output models trained with the leave-one-out cross validation and evaluated with MAE, BMMRE and adjusted R2, can be used to predict both the software effort and the class of software effort estimate. Robust statistical tests (Welch's t-test and Kruskal-Wallis H-test) were used to examine the statistical significant differences in the models’ prediction performances. Results We observed the following: (1) the duplex output models not only predicted the effort estimates, they also offered a guide to interpreting the effort expended; (2) incorporating the genetic search algorithm into the duplex output model allowed the sampling of relevant features for improved prediction accuracy; and (3) ElasticNet, a hybrid regression, provided superior prediction accuracy over the ATLM, the state-of-the-art baseline regression. Conclusion The results show that the duplex output model provides a self-guided benchmark for interpreting estimated software effort. ElasticNet can also serve as a baseline model for SEE.
KW - Duplex output
KW - Effort classification
KW - Effort estimation
KW - Multiple regression models
UR - http://www.scopus.com/inward/record.url?scp=85030449177&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2017.09.010
DO - 10.1016/j.infsof.2017.09.010
M3 - Article
AN - SCOPUS:85030449177
SN - 0950-5849
VL - 94
SP - 1
EP - 13
JO - Information and Software Technology
JF - Information and Software Technology
ER -