Data, Text, and Web Mining for Business Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 7 of 7
  • Item
    Use of Electronic Word of Mouth as Quality Metrics: A Comparison of Airline Reviews on Twitter and Skytrax
    ( 2022-01-04) Lu, Lin ; Mitra, Amit ; Wang, Yen-Yao ; Wang, Yu ; Xu, Pei
    User-generated content (UGC) at online platforms serves as a critical data source in the service industry as it can be accessed in real-time and reflect customers’ changing focus on service aspects. Drawing upon the importance-performance analysis framework, we propose a methodology to derive service quality metrics by utilizing the heterogeneous sources of UGC with customized text mining techniques and examining the effectiveness of these quality metrics. UGC data related to major U.S. airlines were collected from non-social media (Skytrax) and social media platforms (Twitter) from 2014 to June 2019. The results suggest that the topic distributions and the UGC-derived weighted service quality (WSQ, which represents the weighted sentiment based on service aspects) significantly vary between the non-social media and social media platforms. In addition, the WSQ scores derived from two platforms are significant indicators of the objective service quality measurement (i.e., airline quality rating) with stronger predictive power from the social media derived WSQ score.
  • Item
    The Devil is in The Details: Measuring Sensory Processing Sensitivity Using Natural Language Processing
    ( 2022-01-04) Yuan, Lingyao ; Zhang, Wenli ; Scheibe, Kevin
    Personality traits play a strong role in our perceptions, attitudes, and decision-making behaviors in our daily lives, including our choices of words and writing patterns. While prior Information Systems (IS) research on personality typically used the Big Five personality traits as a theoretical framework, we look into measuring a comparatively new inherent personality trait, sensory processing sensitivity, using natural language processing. We collect data on twenty general essay questions from along with self-reported sensory processing sensitivity survey questions from 241 participants. We categorize participants based on survey questions with multiple methods and derive different features from the textual data. Our results show almost perfect agreement among the different methods categorizing a highly sensitive person versus a non-highly sensitive person. The initial analysis demonstrates that certain features can be of great potential in measuring sensory processing sensitivity in written text.
  • Item
    Looking Beyond Content: Modeling and Detection of Fake News from a Social Context Perspective
    ( 2022-01-04) Xiao, Kenan ; Wang, Longwei ; Gupta, Ashish ; Qin, Xiao
    The widespread fake news on social media has boosted the demand for reliable fake news detection techniques. Such dissemination of fake news can influence public opinions and society. More recently, a growing number of methods for detecting fake news have been proposed. However, most of these approaches have significant limitations in timely detection of fake news. To facilitate early detection of fake news, we propose a unique framework FNEPP (Fake News Engagement and Propagation Path) from a social context perspective, which explicitly combines news contents, user engagements, user characteristics, and the news propagation path as composite features of two collaborative modules. The engagement module captures news contents and user engagements, while the propagation path module learns global and local patterns of user characteristics and news dissemination patterns. Experimental results on two real-world datasets demonstrate the effectiveness and efficiency of the proposed FNEPP framework.
  • Item
    Feature Extraction for Polish Language Named Entities Recognition in Intelligent Office Assistant
    ( 2022-01-04) Denisiuk, Aleksander ; Ganzha, Maria ; Wasielewska-Michniewska, Katarzyna ; Paprzycki, Marcin
    The purpose of this contribution is to present a feature extractor that was designed as a part of a Named Entity Recognition (NER) system, which is to be used in a Robotic Process Automation application with a self-learning ability. The NER system has a screen of the user interface as its input, and tries to recognize and categorize all the named entities that can be located within this screen. The set of features that can be extracted from the input, is discussed in the article. The local context features appear to be very important in the considered problem. Experiments show that the entities are recognized with a rate that is satisfactory from the business perspective.
  • Item
    Analogical Reasoning: An Algorithm Comparison for Natural Language Processing
    ( 2022-01-04) Combs, Kara ; Bihl, Trevor ; Ganapathy, Subhashini ; Staples, Drue
    There is a continual push to make Artificial Intelligence (AI) as human-like as possible; however, this is a difficult task. A significant limitation is the inability of AI to learn beyond its current comprehension. Analogical reasoning (AR), whereby learning by analogy occurs, has been proposed as one method to achieve this goal. Current AR models have their roots in symbolist, connectionist, or hybrid approaches which indicate how analogies are evaluated. No current studies have compared psychologically-inspired and natural language processing (NLP)-produced algorithms to one another; this study compares seven AR algorithms from both realms on multiple-choice word-based analogy problems. Assessment is based on selection of the correct answer, “correctness,” and their similarity score prediction compared to the “ideal” score, which is defined as the “goodness” metric. Psychologically-based models have an advantage based on our metrics; however, there is not a clear one-size-fits-all algorithm for all AR problems.
  • Item
    A computational method to track the evolution of business models in the Digital Economy
    ( 2022-01-04) Wood, Zena ; Walker, David ; Parry, Glenn
    Companies within the Digital Economy are evolving their business models as they take advantage of the opportunities afforded by emerging digital technologies. There is a need to develop methods that will allow researchers and policy makers to understand the existence of, and relationships between, the different business models within the Digital Economy and track their evolution. Such methods could also help quantify the size and growth of the Digital Economy. This paper presents a computational method, which utilizes machine learning and web scraping, to identify new business models, and a taxonomy of organisations, through the analysis of a firm’s webpage. The work seeks to provide an autonomous tool that provides regular output tracking trends in the number of firms in a market, their business model and changes in activity from product to service over time. This information would provide valuable and actionable insight for researchers, firms and markets.
  • Item
    Introduction to the Minitrack on Data, Text, and Web Mining for Business Analytics
    ( 2022-01-04) Davazdahemami, Behrooz ; Zolbanin, Hamed ; Delen, Dursun