Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/70943

Predicting Question Deletion and Assessing Question Quality in Social Q&A Sites using Weakly Supervised Deep Neural Networks

File Size Format  
0264.pdf 263.51 kB Adobe PDF View/Open

Item Summary

Title:Predicting Question Deletion and Assessing Question Quality in Social Q&A Sites using Weakly Supervised Deep Neural Networks
Authors:Ghosh, Souvick
Keywords:Data Analytics, Data Mining and Machine Learning for Social Media
automatic prediction
deep learning
question-answering forums
question deletion
show 1 morequestion quality
show less
Date Issued:05 Jan 2021
Abstract:Community question answering (CQA) sites, which use the power of collective knowledge, have emerged as popular destinations for complex and personalized questions that require human-human interactions and multiple rounds of clarifications between the asker and the answerer. In this paper, we undertook a threefold task: First, we developed a deep neural network model to automatically predict the questions that are likely to be deleted by the moderators. Second, we hypothesized that there exists a relationship between the question quality and its probability of being deleted by the forum moderators. We developed a deep model using deleted questions and used it for predicting question quality. Our contribution is not limited to developing the predictor model; we also created the gold standard data for question quality assessment. Lastly, we explored the efficiency of different input representations, optimization functions, and neural network models for predicting question quality. When assessing question quality, the results highlight that combining natural language features with word embeddings can result in better performance (higher recall and f-scores) than word embeddings alone. Our model predicted deleted-questions with an accuracy of 97.8% and precision and true positive rates (TPR) above 0.95. While assessing question quality, our model obtained a TPR of 0.841 and a precision of 0.514. This research serves as the first step toward automatic content moderation in CQA sites; identifying poor quality questions would allow askers to improve the quality of questions asked and the moderators to handle a large volume of questions during content moderation.
Pages/Duration:10 pages
URI:http://hdl.handle.net/10125/70943
ISBN:978-0-9981331-4-0
DOI:10.24251/HICSS.2021.329
Rights:Attribution-NonCommercial-NoDerivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/
Appears in Collections: Data Analytics, Data Mining and Machine Learning for Social Media


Please email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.

This item is licensed under a Creative Commons License Creative Commons