Detecting Important Terms in Source Code for Program Comprehension

Date
2019-01-08
Authors
Rodeghero, Paige
McMillan, Collin
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Software Engineering research has become extremely dependent on terms (words in textual data) extracted from source code. Different techniques have been proposed to extract the most "important'' terms from code. These terms are typically used as input to research prototypes: the quality of the output of these prototypes will depend on the quality of the term extraction technique. At present no consensus exists about which technique predicts the best terms for code comprehension. We perform a literature review, and propose a unified prediction model based on a Naive Bayes algorithm. We evaluate our model in a field study with professional programmers, as well as a standard 10-fold synthetic study. We found our model predicts the top quartile of the most-important terms with approximately 50% precision and recall, outperforming other popular techniques. We found the predictions from our model to help programmers to the same degree as the gold set.
Description
Keywords
Software Product Lines and Platform Ecosystems: Engineering, Services, and Management, Software Technology, Source Code Terms, Program Comprehension
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 52nd Hawaii International Conference on System Sciences
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.