Data Analytics, Data Mining and Machine Learning for Social Media

Permanent URI for this collection


Recent Submissions

Now showing 1 - 8 of 8
  • Item
    Emotions Trump Facts: The Role of Emotions in on Social Media: A Literature Review
    ( 2018-01-03) Hyvärinen, Hissu ; Beck, Roman
    Emotions are an inseparable part of how people use social media. While a more cognitive view on social media has initially dominated the research looking into areas such as knowledge sharing, the topic of emotions and their role on social media is gaining increasing interest. As is typical to an emerging field, there is no synthesized view on what has been discovered so far and - more importantly - what has not been. This paper provides an overview of research regarding expressing emotions on social media and their impact, and makes recommendations for future research in the area. Considering differentiated emotion instead of measuring positive or negative sentiment, drawing from theories on emotion, and distinguishing between sentiment and opinion could provide valuable insights in the field.
  • Item
    Yaks versus Tweets: Sentiment Discrepancy During a Social Crisis
    ( 2018-01-03) Koohikamali, Mehrdad ; Gerhart, Natalie
    People use social networks to get current information, express their emotions and ideas, and connect with others. During a social crisis, there is a heightened value in using a social network to get information. Unfortunately, using a social network during a social crisis also provides fertile grounds for uncertainties and rapid dissemination of misinformation. Currently, there are multiple types of social networks including traditional and anonymous social networks. This research considers the differences between these two types of social networks. During the -˜Concerned Student 1950’, a student activist group at the University of Missouri, crisis at the University of Missouri, we captured users’ messages on two distinct anonymous and traditional social networks. Through sentiment analysis of datasets from Twitter and Yik Yak, we find that people express less total sentiment and more extremity on anonymous social networks. Results show extremity and length positively influence engagement, but total sentiment negatively influence engagement. These findings provide guidance for developers, law enforcement, and social network users.
  • Item
    Fine Grained Approach for Domain Specific Seed URL Extraction
    ( 2018-01-03) Sanagavarapu, Lalit Mohan ; Sarangi, Sourav ; Y, Raghu Reddy ; Varma, Vasudeva
    Domain Specific Search Engines are expected to provide relevant search results. Availability of enormous number of URLs across subdomains improves relevance of domain specific search engines. The current methods for seed URLs can be systematic ensuring representation of subdomains. We propose a fine grained approach for automatic extraction of seed URLs at subdomain level using Wikipedia and Twitter as repositories. A SeedRel metric and a Diversity Index for seed URL relevance are proposed to measure subdomain coverage. We implemented our approach for 'Security - Information and Cyber' domain and identified 34,007 Seed URLs and 400,726 URLs across subdomains. The measured Diversity index value of 2.10 conforms that all subdomains are represented, hence, a relevant 'Security Search Engine' can be built. Our approach also extracted more URLs (seed and child) as compared to existing approaches for URL extraction.
  • Item
    The 2016 US Presidential Election on Facebook: An Exploratory Analysis of Sentiments
    ( 2018-01-03) Alashri, Saud ; Srivatsav Kandala, Srinivasa ; Bajaj, Vikash ; Parriott, Emily ; Awazu, Yukika ; C. Desouza, Kevin
    Social media platforms are valuable tools for political campaigns. In this study, we analyze a dataset representing over 22 thousand Facebook posts by candidates and over 48 million comments to understand the nature of online discourse. Specifically, we study the interaction between political candidates and the public during the 2016 presidential elections in the United States. We outline a novel method to classify commentators into four groups: strong supporters, supporters, dissenters, and strong dissenters. Comments by each group on policy-related topics are analyzed using sentiment analysis. Finally, we discuss avenues for future research to study the dynamics of social media platforms and political campaigns.
  • Item
    Automated Generation of Latent Topics on Emerging Technologies from YouTube Video Content
    ( 2018-01-03) Daniel, Clinton ; Dutta, Kaushik
    Topic modeling has been widely adopted by researchers for a variety of different research problems that involve the mining of text corpora to generate a latent set of topics. Specifically, the Latent Dirichlet Allocation (LDA) algorithm is well documented within academic literature in terms of its application and automated topic generation from data sources such as blogs, social media, and other text collections. YouTube now offers access to over a billion auto-generated video transcript documents that have been recorded and posted to its social platform. The availability of this data offers an opportunity for researchers to investigate a variety of topics that are being discussed and posted to the platform. Specifically, we will study, using the LDA algorithm, discussions related to emerging technologies that have been posted on YouTube to better understand what latent topics can be auto-generated and what kind of methodology can be used to analyze this data.
  • Item
    "Leadership in Action: How Top Hackers Behave" A Big-Data Approach with Text-Mining and Sentiment Analysis
    ( 2018-01-03) Biswas, Baidyanath ; Mukhopadhyay, Arunabha ; Gupta, Gaurav
    This paper examines hacker behavior in dark forums and identifies its significant predictors in the light of "leadership theory" for "communities of practice." We combine techniques from online forum features as well as text-mining and sentiment-analysis of messages. We create a multinomial logistic regression model to achieve role-based hacker classification and validate our model with actual hacker forum data. We identify "total number of messages," "number of threads," "hacker keyword frequency," and "sentiments" as the most significant predictors of expert hacker behavior. We also demonstrate that while disseminating technical knowledge, the hacker community follows Pareto principle. As a recommendation for future research, we build a unique keyword lexicon of the most significant terms derived by tf-idf measure. Such investigation of hacker behavior is particularly relevant for organizations in proactive prevention of cyber-attacks. Foresight on online hacker behavior can help businesses save losses from breaches and additional costs of attack-preventive measures.
  • Item
    Analysis of Elections Using Social Listening in Japan
    ( 2018-01-03) Goto, Hisaki ; Goto, Yukiko
    An "Obama-style" election campaign that utilizes social media has now spread and is actively used all over the world. In Japan, however, Internet campaigning was not available until 2013, and even after the ban on Internet campaigning was lifted, campaigning structure cannot fully utilize social media due to regulations. On the other hand, since social media are enthusiastically used in Japan, social listening, through which information is gathered in a spontaneous manner, is useful. During the national election in 2016, 1,777,724 of postings containing political party names were collected, and results predictions were successfully made. This study analyzes national elections in Japan using social listening, where predictions were successfully made in the proportional representative electoral system.
  • Item
    Introduction to the Minitrack on Data Analytics, Data Mining and Machine Learning for Social Media
    ( 2018-01-03) Yates, David J ; Xu, Jennifer ; Haughton, Dominique