Data Analytics, Data Mining, and Machine Learning for Social Media

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 12
  • Item
    The Trending Customer Needs (TCN) Dataset: A Benchmarking and Automated Evaluation Approach for New Product Development
    (2023-01-03) Kilroy, David; Caton, Simon; Healy, Graham
    In recent years, there have been many studies which summarize User Generated Content as lists of ranked keyphrases representing customer needs for the purposes of New Product Development. However, methods for the evaluation of keyphrase lists do not robustly assess solutions for these purposes. Therefore, in this paper we present the “Trending Customer Needs” (TCN) dataset of over 9000 top trending customer need keyphrases organized by month from 2007-2021 which spans 37 product categories in the area of Consumer Packaged Goods (e.g. toothpaste, eyeliner, beer etc.). TCN is a curated dataset for the benchmarking of supervised machine learning approaches in the prediction of customer needs using User Generated Content. We describe the process of curating TCN while ensuring its quality. Finally, we demonstrate its utility via a case study of Reddit discourse as a potential predictor for future customer needs in Consumer Packaged Goods.
  • Item
    Visualization of POI Category on the Dynamic Rasterized Map Tiles from Geo-Tagged Social Media (Twitter) with SZ-GAT
    (2023-01-03) Xie, Huaze; Li, Da; Wang, Yuanyuan; Kawai , Yukiko
    Spatial zooming graph attention networks (SZ-GAT) is an emerging framework to improve the quality of recommended places visualization on the map. With the advent of location sharing on social networks via mobile devices, the geographic characteristics of the user's points of interest (POIs) contain the visit history, map check-in positions, recommended places, and route plans. In the context of user-preferred POI prediction with map zooming SZ-GAT framework, we propose a visualization for raster category exploration that uses tweet user visit history to represent the POI visit popularity of the raster units. We concentrate on the performance of the POI data visualized map layer zooming process and our results show that the SZ-GAT framework has a better performance of raster category regression with the baselines. Raster category prediction will be used for urban area division, dynamic category feature extraction with user visit history, and government policy-making based on user behaviors of map tiles. This study promotes the progress of deep learning and data mining in the field of human geographic information.
  • Item
    Is a Pretrained Model the Answer to Situational Awareness Detection on Social Media?
    (2023-01-03) Lo, Siaw Ling; Lee, Kahhe; Zhang, Yuhao
    Social media can be valuable for extracting information about an event or incident on the ground. However, the vast amount of content shared, and the linguistic variants of languages used on social media make it challenging to identify important situational awareness content to aid in decision-making for first responders. In this study, we assess whether pretrained models can be used to address the aforementioned challenges on social media. Various pretrained models, including static word embedding (such as Word2Vec and GloVe) and contextualized word embedding (such as DistilBERT) are studied in detail. According to our findings, a vanilla DistilBERT pretrained language model is insufficient to identify situation awareness information. Fine-tuning by using datasets of various event types and vocabulary extension is essential to adapt a DistilBERT model for real-world situational awareness detection.
  • Item
    What Do Customers Say About My Products? Benchmarking Machine Learning Models for Need Identification
    (2023-01-03) Stahlmann, Sven; Ettrich, Oliver; Kurka, Marco; Schoder, Detlef
    Needmining is the process of extracting customer needs from user-generated content by classifying it as either informative or uninformative regarding need content. Contemporary studies achieve this by utilizing machine learning. However, models found in the literature cannot be compared to each other because they use private data for training and testing. This study benchmarks all previously suggested needmining models including CNN, SVM, RNN, and RoBERTa. To ensure an unbiased comparison, this study samples and annotates a dataset of customer reviews for products from 4 different categories from amazon. Henceforth, the dataset is publicly available and serves as a gold-set for future needmining benchmarks. RoBERTa outperformed other classifiers and seems to be best suited for needmining. The relevance of this study is reinforced by the fact that this benchmark creates a different hierarchy between models than otherwise suggested by comparing the results of previous studies.
  • Item
    Revisiting Review Depth in Search for Helpful Online Reviews
    (2023-01-03) Dorwat, Shardul; Namvar, Morteza; Akhlaghpour , Saeed
    This study investigates online review features that constitute review depth and assess their impacts on review helpfulness. It develops a model capturing the moderating effects of heuristic and systematic cues of an online review on the relationship between review length and its helpfulness. In particular, this study examines the moderating effects of price, product type, review readability and the presence of two-sided arguments. For testing the model, a dataset of 568,454 reviews from 256,059 different reviewers on Amazon.com were analyzed. The variables were operationalized using test processing techniques and relationships were empirically tested using regression and machine learning models. The results highlight significant moderating effects of review readability and the presence of two-sided arguments on the relationship between review length and its helpfulness. However, the results did not confirm the moderating effects of price and product type. This article discusses the significant implications for a better understanding of review depth and helpfulness in e-commerce platforms.
  • Item
    Deploying Artificial Intelligence to Combat Covid-19 Misinformation on Social Media: Technological and Ethical Considerations
    (2023-01-03) Cartwright, Barry; Frank, Richard; Weir, George; Padda, Karmvir; Strange, Sarah-May
    This paper reports on research into online misinformation pertaining to the COVID-19 pandemic using artificial intelligence. This is part of our longer-term goal, i.e., the development of an artificial intelligence (machine-learning) tool to assist social media platforms, online service providers and government agencies in identifying and responding to misinformation on social media. We report herein on the predictive accuracy accomplished by applying a combination of technologies, including a custom-designed web-crawler, The Dark Crawler (TDC) and the Posit toolkit, a text-reading software solution designed by George Weir of University of Strathclyde. Overall, we found that performance of models based upon Posit-derived textual features showed high levels of correlation to the pre-determined (manual and machine-driven) data classifications. We further argue that the harms associated with COVID-19 misinformation — e.g., the social and economic damage, and the deaths and severe illnesses — outweigh the right to personal privacy and freedom of speech considerations.
  • Item
    CSR Communication on Twitter - A Scoping Review on Social Media Mining and Analytic Methods
    (2023-01-03) Pilgrim, Katharina; Koss, Jonathan; Bohnet-Joschko, Sabine
    Adopting corporate social responsibility (CSR) is becoming increasingly mandatory as international legislation puts pressure on companies to implement and report on appropriate CSR measures. As of 2024, a significant number of companies will need to report on CSR topics for the first time. To identify relevant topics that resonate best in the industry or even with one's own stakeholder groups and should therefore be picked up, addressed and reported on preferentially, social media mining (SMM) can be an efficient ap-proach for companies. By reviewing applied SMM and analytic methods of Twitter data, we identified four methodological approaches that use algorithms to identify relevant CSR topics for companies to engage with. This scoping review thus provides a systematized overview of SMM pipelines for use, being equally relevant for academics and practitioners aiming at computational analysis of Twitter content regarding CSR activities and communication.
  • Item
    WallStreetBets: An Analysis of Investment Advice Democratization
    (2023-01-03) Buz, Tolga; De Melo, Gerard
    Reddit's WallStreetBets (WSB) community has come to prominence due to its role in the hype around GameStop and other meme stocks. Yet very little is known about the reliability of the investment advice disseminated on WSB. We investigate whether an anonymous, investment-focused community such as WSB can be a valuable source for investment advice and thus may constitute a way of democratizing access to financial knowledge. Our analysis reviews data spanning 28 months to assess how successful an investor relying on WSB recommendations could have been. We detect buy and sell signals and define a WSB portfolio based on the community's most popular stocks. Our evaluation shows that this portfolio has grown significantly, outperforming the S&P 500 over the reviewed time frame. We find that filtering for proactive posts yields higher returns and our review of the period before 2021 shows that the GameStop hype merely amplified previously existing characteristics.
  • Item
    What Can Online Doctor Reviews Tell Us? A Deep Learning Assisted Study of Telehealth Service
    (2023-01-03) Hao, Haijing; Zhang, Bin; Zhan, Yongcheng; Wu, Jiang
    The present study develops a novel deep learning method which assists text mining of online doctor reviews to extract underlying sentiment scores. Those scores can be used to estimate a healthcare service quality model to investigate how the online doctor reviews impact the online doctor consultation demand. Based on the data from the largest online health platforms in China, our model results show that the underlying sentiment scores have statistically significant impacts on the demand of online doctor consultation. Theoretically, the present study constructs an innovative deep learning algorithm with a better performance than four widely used text mining methods, which can be applied to text mining of many online forums or social media texts. Empirically, our model results show what factors impact the health service quality and online doctor consultation demand, and following those factors, healthcare professionals can improve their service.