Detecting Important Terms in Source Code for Program Comprehension

Rodeghero, Paige; McMillan, Collin

Detecting Important Terms in Source Code for Program Comprehension

Files

0746.pdf (296.74 KB)

Date

2019-01-08

Authors

Rodeghero, Paige

McMillan, Collin

Abstract

Software Engineering research has become extremely dependent on terms (words in textual data) extracted from source code. Different techniques have been proposed to extract the most "important'' terms from code. These terms are typically used as input to research prototypes: the quality of the output of these prototypes will depend on the quality of the term extraction technique. At present no consensus exists about which technique predicts the best terms for code comprehension. We perform a literature review, and propose a unified prediction model based on a Naive Bayes algorithm. We evaluate our model in a field study with professional programmers, as well as a standard 10-fold synthetic study. We found our model predicts the top quartile of the most-important terms with approximately 50% precision and recall, outperforming other popular techniques. We found the predictions from our model to help programmers to the same degree as the gold set.

Keywords

Software Product Lines and Platform Ecosystems: Engineering, Services, and Management, Software Technology, Source Code Terms, Program Comprehension

URI

http://hdl.handle.net/10125/60186

Extent

10 pages

Related To

Proceedings of the 52nd Hawaii International Conference on System Sciences

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Software Product Lines and Platform Ecosystems: Engineering, Services, and Management

Full item page

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.