Data Analytics and Data Mining for Social Media Minitrack

Permanent URI for this collection

Social media is changing how we work and play. It is also changing way we access and consume media, stay in touch with family and friends, as well as how we communicate within our on-line communities. One of the things these activities share in common is that they generate a tremendous volume of data that can be analyzed and mined for both research and commercial purposes. This minitrack focuses on research that brings together social media (or social networks) and data analytics & data mining. We welcome quantitative, theoretical or applied papers whose approaches are within the scope of data analytics and data mining, and closely related areas (e.g., data warehousing, content mining, network analysis, structure mining, business intelligence and knowledge discovery).

Topics of interest include (but are not limited to):

  • Discovery, collection and extraction of Social Media data
  • Text- or image-based mining of Social Media content
  • Opinion mining, sentiment analysis and recommendation analysis
  • Cleaning, curation and provenance of data in social networks
  • Social Network Analysis; exploration of massive social networks
  • Identifying and profiling influential participants, subgroups and communities
  • Crowd or cloud computation on Social Media data
  • Predictive and forecasting analytics based on Social Media content
  • Trend analysis to identify emerging terms, topics and ideas
  • Visual analysis of web network structure, usage and content
  • Semantic representations of on-line content, link analysis and linkages
  • Social search, retrieval and ranking
  • Analysis of web-based collective intelligence
  • Performance and scalability of Social Media data management
  • Social innovation and effecting change through Social Media

Minitrack Co-Chairs:

David Yates (Primary Contact)
Bentley University

Jennifer Xu
Bentley University

Dominique Haughton
Bentley University

Xiangbin Yan
Harbin Institute of Technology, China


Recent Submissions

Now showing 1 - 9 of 9
  • Item
    Media Professionals’ Opinions about Interactive Visualizations of Political Polarization during Brazilian Presidential Campaigns on Twitter
    ( 2017-01-04) Queiroz Santos, Caroline ; Cunha, Holisson ; Teixeira, Carlos ; Ramos de Souza, Daniele ; Tietzmann, Roberto ; Manssour, Isabel ; Selbach Silveira, Milene ; Träsel, Marcelo ; Dubugras Alcoba Ruiz, Duncan ; Barros, Rodrigo
    Interactive data visualization techniques are an important way to obtain information from large datasets. Data journalism is an emerging area that strongly makes use of such techniques. In this work we investigate the relationship between journalists (and media professionals) in their job routine and data visualization, with the main goal of understanding if these professionals know and use data visualization tools in their job context, as well as if they consider these resources to be important. For this, we present the results of a survey made with journalists and media professionals to analyze how interactive visualizations could help them to get insight or knowledge of such data, and if their use may improve and support these professionals' activities. The results indicate that visualization and data analysis tools are still not easily accessible by those professionals, and therefore still less influential than they could be. However, most participants considered data visualization a valuable resource in their news production routines. As a contribution, we also identified positive points and understanding gaps of visualizations, as well as the perception of journalists and media professionals about getting information from data visualization.
  • Item
    Lariat: A Visual Analytics Tool for Social Media Researchers to Explore Twitter Datasets
    ( 2017-01-04) Chen, Nan-Chen ; Brooks, Michael ; Kocielnik, Rafal ; Hong, Sungsoo (Ray) ; Smith, Jeff ; Lin, Sanny ; Qu, Zening ; Aragon, Cecilia
    Online social data is potentially a rich source of insight into human behavior, but the sheer size of these datasets requires specialized tools to facilitate social media research. Visual analytics tools are one promising approach, but calls have been made for more in-depth studies in specific application domains to contribute to the design of such tools. We conducted a formative study to better understand the needs of social media researchers, and created Lariat, a visual analytics tool that facilitates exploratory data analysis through integrated grouping and visualization of social media data. The design of Lariat was informed by the results of the formative study and sensemaking theory, both indicating that the exploratory processes require search, comparison, verification, and iterative refinement. Based on our results and the evaluation of Lariat, we identify a number of design implications for future visual analytics tools in this domain.
  • Item
    Exploring Time Series Spectral Features in Viral Hashtags Prediction
    ( 2017-01-04) Doong, Shing ; Chung, Daniel
    Viral hashtags spread across a large population of Internet users very quickly. Previous studies use features mostly in an aggregate sense to predict the popularity of hashtags, for example, the total number of hyperlinks in early tweets adopting a tag. Since each tweet is time stamped, many aggregate features can be decomposed into fine-grained time series such as a series of numbers of hyperlinks in early adopting tweets. This research utilizes frequency domain tools to analyze these time series. In particular, we apply scalogram analysis to study the series of adoption time lapses and the series of mentions and hyperlinks in early adopting tweets. Besides continuous wavelet transforms (CWTs), we also use fast wavelet transforms (FWTs) to analyze the time series. Through experiments with two sets of tweets collected in different seasons, out-of-sample cross validations show that wavelet spectral features can generally improve the prediction performance, and discrete FWT yields results as good as the more complicated CWT-based methods with scalogram analysis.
  • Item
    Buzz vs. Sales: Big Social Data Analytics of Style Icon Campaigns and Fashion Designer Collaborations on H&M’s Facebook Page
    ( 2017-01-04) Komtesse af Rosenborg, Desiree Christina ; Buhl-Andersen, Ida ; Nilsson, Line Bygvrå ; Rebild, Mark Philip ; Mukkamala, Raghava Rao ; Hussain, Abid ; Vatrapu, Ravi
    This paper examines the relationship between social media engagement and financial performance of the global fast fashion company, H&M. We analyze big social data from Facebook on the seven H&M style collections that occurred during 2012 and 2013 to investigate if style icon campaigns have a larger effect on quarterly sales than designer collaborations. We find that style icons such as David Beckham generate more social buzz than designer collaborations. Social Set Analysis of the Facebook data shows that the overlap between the users H&M reach with their different style collections is fairly small. The deviations between forecasted quarterly sales and actual quarterly sales are analyzed. Our results show that that style icon campaigns have a larger impact on sales than designer collaborations and reveal that the quarters with the largest deviations coincide with the quarter in which H&M ran a style icon campaign. We discuss the implications of our findings and outline directions for future research.
  • Item
    Birds of a Feather Talk Together: User Influence on Language Adoption
    ( 2017-01-04) Kersgaw, Daniel ; Rowe, Matthew ; Noulas, Anastasios ; Stacey, Patrick
    Language is in constant flux be it from changes in meaning to the introduction of new terms. At the user level it changes by users accommodating their language in relation to whom they are in contact with. By mining diffusion's of new terms across social networks we detect the influence between users and communities. This is then used to compute the user activation threshold at which they adopt new terms dependent on their neighbours. We apply this method to four different networks from two popular on-line social networks (Reddit and Twitter). This research highlights novel results: by testing the network through random shuffles we show that the time at which a user adopts a term is dependent on the local structure, however, a large part of the influence comes from the global structure and that influence between users and communities is not significantly dependent on network structures.
  • Item
    A Peer-Based Approach on Analyzing Hacked Twitter Accounts
    ( 2017-01-04) Murauer, Benjamin ; Zangerle, Eva ; Specht, Günther
    Social media has become an important part of the lives of their hundreds of millions of users. Hackers make use of the large target audience by sending malicious content, often by hijacking existing accounts. This phenomenon has caused widespread research on how to detect hacked accounts, where different approaches exist. This work sets out to analyze the possibilities of including the reactions of hacked Twitter accounts’ peers into a detection system. Based on a dataset of six million tweets crawled from Twitter over the course of two years, we select a subset of tweets in which users react to alleged hacks of other accounts. We then gather and analyze the responses to those messages to reconstruct the conversations made. A quantitative analysis of these conversations shows that 30% of the users that are allegedly being hacked reply to the accusations, suggesting that these users acknowledge that their account was hacked.
  • Item
    A Large-scale Analysis of the Marketplace Characteristics in Fiverr
    ( 2017-01-04) Maity, Suman Kalyan ; Jha, Chandra Bhanu ; Kumar, Avinash ; Sengupta, Ayan ; Modi, Madhur ; Mukherjee, Animesh
    Crowdsourcing platforms have become quite popular due to the increasing demand of human computation-based tasks. Though the crowdsourcing systems are primarily demand-driven like MTurk, supply-driven marketplaces are becoming increasingly popular. Fiverr is a fast growing supply-driven marketplace where the sellers post micro-tasks (gigs) and users purchase them for prices as low as $5. In this paper, we study the Fiverr platform as a unique marketplace and characterize the sellers, buyers and the interactions among them. We find that sellers are more appeasing in their interactions and try to woo their buyers into buying their gigs. There are many small tightly-knit communities existing in the seller-seller network who support each other. We also study Fiverr as a seller-driven marketplace in terms of sales, churn rates, competitiveness among various subcategories etc. and observe that while there are certain similarities with common marketplaces there are also many differences.
  • Item
    A Domain Oriented LDA Model for Mining Product Defects from Online Customer Reviews
    ( 2017-01-04) Qiao, Zhilei ; Zhang, Xuan ; Zhou, Mi ; Wang, Gang Alan ; Fan, Weiguo
    Online reviews provide important demand-side knowledge for product manufacturers to improve product quality. However, discovering and quantifying potential products’ defects from large amounts of online reviews is a nontrivial task. In this paper, we propose a Latent Product Defect Mining model that identifies critical product defects. We define domain-oriented key attributes, such as components and keywords used to describe a defect, and build a novel LDA model to identify and acquire integral information about product defects. We conduct comprehensive evaluations including quantitative and qualitative evaluations to ensure the quality of discovered information. Experimental results show that the proposed model outperforms the standard LDA model, and could find more valuable information. Our research contributes to the extant product quality analytics literature and has significant managerial implications for researchers, policy makers, customers, and practitioners.
  • Item
    Introduction to Data Analytics and Data Mining for Social Media Minitrack
    ( 2017-01-04) Yates, David ; Xu, Jennifer ; Haughton, Dominique ; Yan, Xiangbin