Data Analytics, Data Mining and Machine Learning for Social Media

Item

Identifying Citation Sentiment and its Influence while Indexing Scientific Papers

( 2020-01-07) Ghosh, Souvick ; Shah, Chirag

Sentiment analysis has proven to be a popular research area for analyzing social media texts, newspaper articles, and product reviews. However, sentiment analysis of citation instances is a relatively unexplored area of research. For scientific papers, it is often assumed that the sentiment associated with citation instances is inherently positive. This assumption is due to the hedged nature of sentiment in citations, which is difficult to identify and classify. As a result, most of the existing indexes focus only on the frequency of citation. In this paper, we highlight the importance of considering the sentiment of citation while preparing ranking indexes for scientific literature. We perform automatic sentiment classification of citation instances on the ACL Anthology collection of papers. Next, we use the sentiment score in addition to the frequency of citation to build a ranking index for this collection of scientific papers. By using various baselines, we highlight the impact of our index on the ACL Anthology collection of papers. Our research contributes toward building more sentiment sensitive ranking index which better underlines the influence and usefulness of research papers.

Item

Success Factors of Donation-Based Crowdfunding Campaigns: A Machine Learning Approach

( 2020-01-07) Alazazi, Massara ; Wang, Bin ; Allan, Tareq

Crowdfunding has emerged as an alternative mechanism to traditional financing mechanisms in which individuals solicit financial capital or donation from the crowd. The success factors of crowdfunding are not well-understood, particularly for donation-based crowdfunding platforms. This study identifies key drivers of donation-based crowdfunding campaign success using a machine learning approach. Based on an analysis of crowdfunding campaigns from Gofundme.com, we show that our models were able to predict the average daily amount received at a high level of accuracy using variables available at the beginning of the campaign and the number of days it had been posted. In addition, Facebook and Twitter shares and the number of likes, improved the accuracy of the models. Among the six machine learning algorithms we used, support vector machine (SVM) performs the best in predicting campaign success.

Item

Evaluation of VI Index Forecasting Model by Machine Learning for Yahoo! Stock BBS Using Volatility Trading Simulation

( 2020-01-07) Sasaki, Kodai ; Suwa, Hirohiko ; Ogawa, Yuki ; Umehara, Eiichi ; Yamashita, Tatsuo ; Tsubouchi, Kota

The risk avoidance is very crucial in investment and asset management. One commonly used index as a risk index is the VI index. Suwa et al. (2017) analyzed stock bulletin board messages and predicted it rise. In our study, we developed a simulation of trading Nikkei stock index options using intra-day data and verified the validity of the VI index prediction model proposed by Suwa et al. In a period from November 18, 2014, to June 29, 2016, we conducted a simulation using a long straddle strategy. The profit and loss from trading with the instructions of their model was +3,021 yen. The benchmark's profit and loss was -3,590 yen. The improvement with their model was +6,611 yen. Therefore, we confirmed that Suwa et al.'s VI index prediction model might be effective.

Item

Using Data Analytics to Filter Insincere Posts from Online Social Networks A Case Study: Quora Insincere Questions

( 2020-01-07) Al-Ramahi, Mohammad ; Alsmadi, Izzat

The internet in general and Online Social Networks (OSNs) in particular continue to play a significant role in our life where information is massively uploaded and exchanged. With such high importance and attention, abuses of such media of communication for different purposes are common. Driven by goals such as marketing and financial gains, some users use OSNs to post their misleading or insincere content. In this context, we utilized a real-world dataset posted by Quora in Kaggle.com to evaluate different mechanisms and algorithms to filter insincere and spam contents. We evaluated different preprocessing and analysis models. Moreover, we analyzed the cognitive efforts users made in writing their posts and whether that can improve the prediction accuracy. We reported the best models in terms of insincerity prediction accuracy.

Item

The New Window to Athletes’ Soul – What Social Media Tells Us About Athletes’ Performances

( 2020-01-07) Gruettner, Arne ; Vitisvorakarn, Min ; Wambsganss, Thiemo ; Rietsche, Roman ; Back, Andrea

Professional sports has evolved from a game to an organization that has been codified, strategized, and commercialized. One factor that is shaping the sports industry is the pervasiveness of social media. On the one hand, social media is used as a powerful medium for distributing and getting news, engaging in topical discussions, and empowering brands. On the other hand, social media has become a crucial mouthpiece for athletes to interact with peers, share opinions, thoughts, and feelings. However, millions of followers, tweets, and likes later, researchers, practitioners, and athletes alike ask whether social media has an impact on an athlete’s performance. We conducted a social media usage and a sentiment analysis of 124,341 Twitter tweets extracted from 31 tennis athletes. We linked these data to 8,095 corresponding match day performances. The results show that high social media usage has a significant negative impact on athletes’ performance.

Item

Generalized Blockmodeling of Multi-Valued Networks

( 2020-01-07) Brown, Nathanael ; Nozick, Linda

This research presents an extension to generalized blockmodeling where there are more than two types of objects to be clustered based on valued network data. We use the ideas in homogeneity blockmodeling to develop an optimization model to perform the clustering of the objects and the resulting partitioning of the ties so as to minimize the inconsistency of an empirical block with an ideal block. The ideal block types used in this modeling are null (all zeros), complete (all ones) and valued. Two case studies are presented: the Southern Women dataset and a larger example using a subset of the IMDb movie dataset.

Item

Measuring and Unpacking Affective Polarization on Twitter: The Role of Party and Gender in the 2018 Senate Races

( 2020-01-07) Mentzer, Kevin ; Fallon, Kate ; Prichard, Janet ; Yates, David

This study examines how the Twittersphere talked about candidates running for the U.S senate in the 2018 congressional elections. We classify Twitter users as Liberal or Conservative to better understand how the two groups use social media during a major national political election. Using tweet sentiment, we assess how the Twittersphere felt about in-group party versus out-group party candidates. When we further break these findings down based on the candidates’ gender, we find that male senatorial candidates were talked about more positively than female candidates. We also find that Conservatives talked more positively about female Republican candidates than they did about Republican male candidates. Female candidates of the out-group party were talked about the least favorably of all candidates. Conservative tweeters exhibit the most positive level of in-group party sentiment and the most negative level of out-group party sentiment. We therefore attribute the most intense affective polarization to this ideological group.

Item

Understanding the Mood of Social Media Messages

( 2020-01-07) Power, Robert ; Robinson, Bella ; Dennett, Amanda ; Jin, Brian ; Paris, Cecile

Social Media is a valuable source of information when seeking to understand community opinion and sentiment about issues of public interest. Such analysis is usually based on sentiment or emotion processing using machine learning techniques or references a curated lexicon of words to measure the emotive intensity being expressed. The lexicon approach can be limited by the sparsity problem, where the lexicon words are not present in the text being processed, and context issues, where the lexicon words have different meanings in the domain under investigation. We have developed a novel technique based on word embeddings to mitigate these issues and present a case study showing its application, where the mood expressed by the community on social media about the Centenary of Armistice in Australia was determined in near real-time.

Item

Context Map Analysis of Fake News in Social Media: A Contextualized Visualization Approach

( 2020-01-07) Seref, Onur ; Seref, Michelle ; Hong, Sukhwa

Visualization tools in text analytics are typically based on content analysis, using $n$-gram frequencies or topic models which output commonly used words, phrases, or topics in a text corpus. However, the interpretation of these visual output and summary labels can be incomplete or misleading when words or phrases are taken out of context. We use a novel Context Map approach to create a connected network of $n$-grams by considering the frequency in which they are used together in the same context. We combine network optimization techniques with embedded representation models to generate an visualization interface with clearer and more accurate interpretation potential. In this paper, we apply our Context Map method to analyze fake news in social media. We compare news article veracity (true versus false news) with orientation (left, mainstream, or right). Our approach provides a rich context analysis of the language used in true versus fake news.

Item

Detecting Political Bots on Twitter during the 2019 Finnish Parliamentary Election

( 2020-01-07) Rossi, Sippo ; Rossi, Matti ; Upreti, Bikesh ; Liu, Yong

In recent years, the political discussion has been dominated by the impact of bots used for manipulating public opinion. A number of sources have reported a widespread presence of political bots in social media sites such as Twitter. Compared to other countries, the influence of bots in Finnish politics have received little attention from media and researchers. This study aims to investigate the influence of bots on Finnish political Twitter, based on a dataset consisting of the accounts following major Finnish politicians before the Finnish parliamentary election of 2019. To identify the bots, we extend the existing models with the use of user-level metadata and state-of-art classification models. The results support our model as a suitable instrument for detecting Twitter bots. We found that, albeit there is a huge amount of bot accounts following major Finnish politicians, it is unlikely resulting from foreign entities’ attempt to influence the Finnish parliamentary election.

Data Analytics, Data Mining and Machine Learning for Social Media

Permanent URI for this collection

Browse

Browse

Recent Submissions