Data Analytics, Data Mining and Machine Learning for Social Media

Permanent URI for this collection


Recent Submissions

Now showing 1 - 10 of 10
  • Item
    Who shapes crisis communication on Twitter? An analysis of German influencers during the COVID-19 pandemic
    ( 2022-01-04) Shahi, Gautam Kishore ; Clausen, Sünje ; Stieglitz, Stefan
    Twitter is becoming an increasingly important platform for disseminating information during crisis situations, such as the COVID-19 pandemic. Effective crisis communication on Twitter can shape the public perception of the crisis, influence adherence to preventative measures, and thus affect public health. Influential accounts are particularly important as they reach large audiences quickly. This study identifies influential German-language accounts from almost 3 million German tweets collected between January and May 2020 by constructing a retweet network and calculating PageRank centrality values. We capture the volatility of crisis communication by structuring the analysis into seven stages based on key events during the pandemic and profile influential accounts into roles. Our analysis shows that news and journalist accounts were influential throughout all phases, while government accounts were particularly important shortly before and after the lockdown was instantiated. We discuss implications for crisis communication during health crises and for analyzing long-term crisis data.
  • Item
    Social Media Mining in Drug Development Decision Making: Prioritizing Multiple Sclerosis Patients’ Unmet Medical Needs
    ( 2022-01-04) Koss, Jonathan ; Bohnet-Joschko, Sabine
    Pharmaceutical companies increasingly must consider patients’ needs in drug development. Since patients’ needs are often difficult to measure, especially in rare diseases, information in drug development decision-making is limited. In the proposed study, we employ the opportunity algorithm to identify and prioritize unmet medical needs of multiple sclerosis patients shared in social media posts. Using topic modeling and sentiment analysis features of the opportunity algorithm are generated. The result implies that sensory problems, pain, mental health problems, fatigue and sleep disturbances represent the highest unmet medical needs of the samples population. The present study suggests a promising potential of this method to provide relevant insights into rare disease populations to promote patient-centered drug development.
  • Item
    Perception Analysis: Pro- and Anti- Vaccine Classification with NLP and Machine Learning
    ( 2022-01-04) Okpala, Izunna ; Romera Rodriguez, Guillermo ; Zheng, Weibing ; Halse, Shane ; Kropczynski, Jess
    Online discussion of the ensuing pandemic exemplifies the extent and complexity of information required to understand human perception. Social media has proven to be a viable medium for identifying actionable data and analyzing public perception. As health sectors all over the world battled to obtain accurate information regarding COVID-19, this research focused on gauging public perceptions of the vaccine. The public reception of the vaccine can be determined by public perception. This study explores how to use machine learning to understand human perceptions in the context of the COVID-19 vaccine. Natural Language Processing (NLP) was employed to detect pro- and anti-vaccine tweets, while two machine learning classification models were used to study the patterns derived from the analysis. The study analyzed people's perceptions of the vaccine by presenting the results from a geographic region, while learning patterns that are likely to be associated with pro- or anti-vaccine perceptions.
  • Item
    Multi-National Topics Maps for Parliamentary Debate Analysis
    ( 2022-01-04) Schaal, Markus ; Davis, Enno ; Mueller, Roland M.
    In recent years, automated political text processing became an indispensable requirement for providing automatic access to political debate. During the Covid-19 worldwide pandemic, this need became visible not only in social sciences but also in public opinion. We provide a path to operationalize this need in a multi-lingual topic-oriented manner. Using a publicly available data set consisting of parliamentary speeches, we create a novel process pipeline to identify a good reference model and to link national topics to the cross-national topics. We use design science research to create this process pipeline as an artifact.
  • Item
    False Rumor (Fake) and Truth News Spread During A Social Crisis
    ( 2022-01-04) Koohikamali, Mehrdad ; Gerhart, Natalie
    During a social crisis, the truthfulness of information becomes very important, particularly in determining if the information will spark extreme social engagement. We test a research model to examine major determinants of message spread during the 2016 Charlotte, North Carolina protests which occurred after false online rumors spread related to the shooting of Keith Lamont Scott. We hypothesize relationships between message spread (retweets) and extremity, negative emotions (sadness and fear), and social ties (reciprocal reply and location proximity), and Twitter experience. Using Poisson regression, we evaluate and compare two separate models (rumor and truth). Results of the analysis indicate that rumors and truths spread differently. More extreme messages spread less if they are truths, and fear does not relate to the spread of rumors. The results of the study provide theoretical and practical insights into the current research in the areas of information diffusion and social engagement.
  • Item
    Do Sequels Outperform or Disappoint? Insights from an Analysis of Amazon Echo Consumer Reviews
    ( 2022-01-04) Shim, Kyong Jin ; Lo, Siaw Ling ; Liew, Su Yee
    Rapid technological advances in recent years drastically transformed our world. Amidst modern technological inventions such as smart phones, smart watches and smart home devices, consumers of electronic digital devices experience greatly improved automation, productivity, and efficiency in conducting routine daily tasks, information searching, shopping as well as finding entertainment. In the last few years, the global smart speaker market has undergone significant growth. As technology continues to advance and smart speakers are equipped with innovative features, the adoption of smart speakers will increase and so will consumer expectations. This research paper presents an aspect-specific sentiment analysis of consumer reviews of the first three generations of Amazon Echo. Our text mining and aspect-specific sentiment analyses reveal that price, sound, smart home, connectivity, and comparison are outperforming aspects whereas voice, app, Q&A, companionship, and shelf life are disappointing and sunsetting aspects. Our study demonstrates a novel cross-generation visualization of directional changes in consumer sentiment using the Bollinger Bands and volume charts.
  • Item
    "Don’t Downvote A\$\$\$\$\$\$s!!": An Exploration of Reddit’s Advice Communities
    ( 2022-01-04) Cannon, Emily ; Crouse, Bianca ; Ghosh, Souvick ; Rihn, Nicholas ; Chua, Kristen
    Advice forums are a crowdsourced way to reinforce cultural norms and moral behavior. Sites like Reddit contain massive amounts of natural language human interaction, with rules and norms unique to each individual subreddit community. To explore this data, we created a dataset with top 1000 posts from each of two such forums, r/AmItheAsshole and r/relationships, and extracted natural language features including sentiment, similarity, word frequency, and demographics using both algorithmic and manual methods. Further, we developed a method to extract demographic information from the subreddits, examined how the post authors’ self-disclosures reflect the unique communities in which their posts are shared, and discussed how the authors’ language use choices might be related to broader social patterns. We observed some differences between the subreddits in terms of word frequency, demographics disclosure, and gendered language. In general, both subreddits had more female posters than male, and posters tended to use more words about their opposite gender than the same. Gender-diverse posters were uncommon. Implications for future research include a more careful, inclusive focus on identity and disclosure and how that interacts with advice-seeking behavior in online communities.
  • Item
    Determining Link Relevancy in Tweets Related to Multiple Myeloma Using Natural Language Processing
    ( 2022-01-04) Van Hoven, Sean ; Thoms, Brian ; Botts, Nathan
    Social media platforms continue to play a leading role in the evolution of how people share and consume information. Information is no longer limited to updates from a user’s immediate social network but have expanded to an abstract network of feeds from across the global internet. Within the health domain, users rely on social media as a means for researching symptoms of illnesses and the myriad of therapies posted by others with similar implications. Whereas in the past, a single user may have received information from a limited number of local sources, now a user can subscribe to information feeds from around the globe and receive real-time updates on information important to their health. Yet how do users know that the information they are receiving is relevant or not? In this age of fake news and widespread disinformation the global domain of medical knowledge can be tough to navigate. Both legitimate and illegitimate practitioners leverage social media to spread information outside of their immediate network in order to reach, sway, and enlist a larger audience. In this research, we develop a system for determining the relevancy of linked webpages using a combination of web mining through Twitter hashtags and natural language processing (NLP).
  • Item
    A Frequency-Based Learning-To-Rank Approach for Personal Digital Traces
    ( 2022-01-04) Vianna, Daniela ; Marian, Amelie
    Personal digital traces are constantly produced by connected devices, internet services and interactions. These digital traces are typically small, heterogeneous and stored in various locations in the cloud or on local devices, making it a challenge for users to interact with and search their own data. By adopting a multidimensional data model based on the six natural questions --- what, when, where, who, why and how --- to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results. Due to the lack of publicly available personal training data, a combination of known-item query generation techniques and an unsupervised ranking model (field-based BM25) is used to build our own training sets. Experiments performed over a publicly available email collection and a personal digital data trace collection from a real user show that the frequency-based learning approach improves search accuracy when compared with traditional search tools.
  • Item