TY - GEN
T1 - An inception architecture-based model for improving code readability classification
AU - Mi, Qing
AU - Keung, Jacky
AU - Xiao, Yan
AU - Mensah, Solomon
AU - Mei, Xiupei
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/6/28
Y1 - 2018/6/28
N2 - The process of classifying a piece of source code into a Readable or Unreadable class is referred to as Code Readability Classification. To build accurate classification models, existing studies focus on handcrafting features from different aspects that intuitively seem to correlate with code readability, and then exploring various machine learning algorithms based on the newly proposed features. On the contrary, our work opens up a new way to tackle the problem by using the technique of deep learning. Specifically, we propose IncepCRM, a novel model based on the Inception architecture that can learn multi-scale features automatically from source code with little manual intervention. We apply the information of human annotators as the auxiliary input for training IncepCRM and empirically verify the performance of IncepCRM on three publicly available datasets. The results show that: 1) Annotator information is beneficial for model performance as confirmed by robust statistical tests (i.e., the Brunner-Munzel test and Cliff's delta); 2) IncepCRM can achieve an improved accuracy against previously reported models across all datasets. The findings of our study confirm the feasibility and effectiveness of deep learning for code readability classification.
AB - The process of classifying a piece of source code into a Readable or Unreadable class is referred to as Code Readability Classification. To build accurate classification models, existing studies focus on handcrafting features from different aspects that intuitively seem to correlate with code readability, and then exploring various machine learning algorithms based on the newly proposed features. On the contrary, our work opens up a new way to tackle the problem by using the technique of deep learning. Specifically, we propose IncepCRM, a novel model based on the Inception architecture that can learn multi-scale features automatically from source code with little manual intervention. We apply the information of human annotators as the auxiliary input for training IncepCRM and empirically verify the performance of IncepCRM on three publicly available datasets. The results show that: 1) Annotator information is beneficial for model performance as confirmed by robust statistical tests (i.e., the Brunner-Munzel test and Cliff's delta); 2) IncepCRM can achieve an improved accuracy against previously reported models across all datasets. The findings of our study confirm the feasibility and effectiveness of deep learning for code readability classification.
KW - Code Readability Classification
KW - Deep Learning
KW - Empirical Software Engineering
KW - Inception Architecture
UR - http://www.scopus.com/inward/record.url?scp=85053686072&partnerID=8YFLogxK
U2 - 10.1145/3210459.3210473
DO - 10.1145/3210459.3210473
M3 - Conference contribution
AN - SCOPUS:85053686072
SN - 9781450364034
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018, EASE 2018
PB - Association for Computing Machinery
T2 - 22nd International Conference on Evaluation and Assessment in Software Engineering, EASE 2018
Y2 - 28 June 2018 through 29 June 2018
ER -