Data Analytics, Data Mining and Machine Learning for Social Media

Permanent URI for this collection


Recent Submissions

Now showing 1 - 6 of 6
  • Item
    User Demographics and Censorship on Sina Weibo
    ( 2021-01-05) Kenney, Wayne ; Leberknight, Christopher
    This paper investigates the relationship between demographics and the frequency of censored posts (weibos) on Sina Weibo. Our results indicate that demographics such as location, gender and paid for features do not provide a good degree of predictive power but help explain how censorship is applied on social media. Using a dataset of 226 million weibos collected in 2012, we apply a binomial regression model to evaluate the predictive quality of user demographics to identify candidates that may be targeted for censorship. Our results suggest male users who are verified (pay for mobile and security features) are more likely to be censored than females or users who are not verified. In addition, users from provinces such as Hong Kong, Macao, and Beijing are more heavily censored compared to any other province in China over the same period.
  • Item
    Predicting Question Deletion and Assessing Question Quality in Social Q&A Sites using Weakly Supervised Deep Neural Networks
    ( 2021-01-05) Ghosh, Souvick
    Community question answering (CQA) sites, which use the power of collective knowledge, have emerged as popular destinations for complex and personalized questions that require human-human interactions and multiple rounds of clarifications between the asker and the answerer. In this paper, we undertook a threefold task: First, we developed a deep neural network model to automatically predict the questions that are likely to be deleted by the moderators. Second, we hypothesized that there exists a relationship between the question quality and its probability of being deleted by the forum moderators. We developed a deep model using deleted questions and used it for predicting question quality. Our contribution is not limited to developing the predictor model; we also created the gold standard data for question quality assessment. Lastly, we explored the efficiency of different input representations, optimization functions, and neural network models for predicting question quality. When assessing question quality, the results highlight that combining natural language features with word embeddings can result in better performance (higher recall and f-scores) than word embeddings alone. Our model predicted deleted-questions with an accuracy of 97.8% and precision and true positive rates (TPR) above 0.95. While assessing question quality, our model obtained a TPR of 0.841 and a precision of 0.514. This research serves as the first step toward automatic content moderation in CQA sites; identifying poor quality questions would allow askers to improve the quality of questions asked and the moderators to handle a large volume of questions during content moderation.
  • Item
    Inferring the Relationship between Anxiety and Extraversion from Tweets during COVID19 – A Linguistic Analytics Approach
    ( 2021-01-05) Gruda, Dritjon ; Ojo, Adegboyega
    We investigate the longitudinal relationship between extraversion and experienced state anxiety in a cohort of Twitter users in New York using a linguistic analytics approach. We find that before COVID-19 was declared a pandemic, highly extraverted individuals experienced lower state anxiety compared to more introverted individuals. This is in line with previous literature. However, there seem to be no significant differences between individuals after the pandemic announcement, which provides evidence that COVID-19 is affecting individuals regardless of their extraversion trait disposition. Finally, a longitudinal examination of the present data shows that extraversion seems to matter more greatly in the early days of the crisis and towards the end of our examined time range. Throughout the crisis, state anxiety did not seem to vary much between individuals with different extraversion dispositions.
  • Item
    Improving News Popularity Estimation via Weak Supervision and Meta-active Learning
    ( 2021-01-05) Nashaat, Mona ; Miller, James
    Social news has fundamentally changed the mechanisms of public perception, education, and even dis-information. Apprising the popularity of social news articles can have significant impact through a diversity of information redistribution techniques. In this article, an improved prediction algorithm is proposed to predict the long-time popularity of social news articles without the need for ground-truth observations. The proposed framework applies a novel active learning selection policy to obtain the optimal volume of observations and achieve superior predictive performance. To assess the proposed framework, a large set of experiments are undertaken; these indicate that the new solution can improve prediction performance by 28% (precision) while reducing the volume of required ground truth by 32%.
  • Item
    A Group Recommendation Model Using Diversification Techniques
    ( 2021-01-05) Oliveira, Amanda ; Durao, Frederico
    In daily life groups are formed naturally, such as watching a movie with friends, or going out for dinner. In all these scenarios, using Recommendation Systems can be helpful by suggesting pieces of information (e.g. movies or restaurants) that satisfies all rather than a single member in the group. To do so, it is crucial to aggregate individual preferences of the group members aiming at satisfying all. Although there are consensus techniques to create the group profile, the recommendations still may be repetitive and overspecialized. This drawback sets precedent for adopting diversification techniques to group recommendations. In this paper, we propose a group recommendation model using diversification techniques that exploits different aggregation techniques over group preferences matrix. The experiments evaluate accuracy and diversity goals for the group recommendations. Results from the experiments point out that our approach achieved 1.8% of diversity increase and 3.8% of precision improvement over compared methods.
  • Item