Assessing Text Representation Methods on Tag Prediction Task for StackOverflow

Date
2023-01-03
Authors
Skenderi, Erjon
Laaksonen, Salla-Maaria
Stefanidis, Kostas
Huhtamäki, Jukka
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
585
Ending Page
Alternative Title
Abstract
A large part of knowledge evolves outside of the operations of an organization. Question and answer online social platforms provide an important source of information to explore the underlying communities. StackOverflow (SO) is one of the most popular question and answer platforms for developers, with more than 23 million questions asked. Organizing and categorizing data is crucial to manage knowledge in such large quantities. Questions posted on SO are assigned a set of tags and textual content of each question may contain coding syntax. In this paper, we evaluate the performance of multiple text representation methods in the task of predicting tags for SO questions and empirically prove the impact of code syntax in text representations. The SO dataset was sampled and questions without code syntax were identified. Two classical text representation methods consisting of BoW and TF-IDF were selected along four other methods based on pre-trained models including Fasttext, USE, Sentence-BERT and Sentence-RoBERTa. Multi-label k'th Nearest Neighbors classifier was used to learn and predict tags based on the similarities between feature-vector representations of the input data. Our results indicate a consistent superiority of the representations generated from Sentence-RoBERTa. Overall, the classifier achieved a 17% or higher improvement on F1 score when predicting tags for questions without any code syntax in content.
Description
Keywords
Text Mining and Analytics, knowledge-intensive work, q&a forums, stackoverlow, tag prediction, text representation
Citation
Extent
10
Format
Geographic Location
Time Period
Related To
Proceedings of the 56th Hawaii International Conference on System Sciences
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.