Data, Text, and Web Mining for Business Analytics Minitrack

Permanent URI for this collection

Data mining is the process of discovering valid, novel, potentially useful, and ultimately understandable patterns (i.e., nuggets of knowledge) in data stored in structured databases, where the data is organized in records populated by categorical, ordinal and continuous variables. Text mining, on the other hand, refers to the very same discovery process as it applies to unstructured data sources including business documents, customer comments, Web pages, and XML files.

This minitrack focuses on decision support aspects of advanced analytics, with emphasis on data, text and Web mining. Topic areas covered in this minitrack include, but are not limited to:

  • New methods and algorithms of data/text/Web mining
  • New and improved processes and methodologies of conducting data/text/Web mining
  • Data acquisition, integration and pre-processing related research topics of data/text/Web mining, such as new and novel ways of data integration/transformation/characterization, data cleaning/scrubbing, data sampling, data reduction, data visualization, etc.
  • Novel, interesting and impactful applications of data/text/Web mining for better managerial decision making
  • Ethical and privacy issues in data/text/Web mining
  • Futuristic directions for data/text/Web mining in the era of Big Data analytics

Accepted papers will be considered for fast-tracking into a special issue on Data Mining & Decision Analytics for Decision Sciences journal. This special issue welcomes the best submissions of HICSS-50. Participants of this minitrack are highly encouraged to submit their extended and enriched papers to this special issue, fully conforming to the specifications of the Decision Sciences journal as outlined by its Author Guidelines. See CFP for additional information.

Minitrack Co-chairs:

Dursun Delen (Primary Contact)
Oklahoma State University

Enes Eryarsoy
Sehir University, Turkey

Şadi E. Şeker

Istanbul Medeniyet University


Recent Submissions

Now showing 1 - 7 of 7
  • Item
    The Impact of Subjective and Objective Experience on Mobile Banking Usage: An Analytical Approach
    ( 2017-01-04) Albashrawi, Mousa ; Kartal, Hasan ; Oztekin, Asil ; Motiwalla, Luvai
    This paper aims to investigate mobile banking (MB) usage through the theoretical lens of UTAUT model with its four pillars. The research model will be tested via a hybrid neural networks-based structural equation modeling (SEM-NN) to reveal significant factors. Universal structural modeling (USM) will be then utilized to find the hidden paths and nonlinearity in our research model. To the best of our knowledge, this is the first study to examine the role of subjective and objective experience on MB usage using a multi-analytical approach. Neural network (NN) and USM can identify the most significant determinants and hidden interaction effects, respectively. Thus, both techniques would help to complement SEM and increase our understanding of the influential factors on MB usage. Preliminary results are presented and discussed. Potential contribution and conclusion are communicated to both academia and industry.
  • Item
    The Impact of Content, Context, and Creator on User Engagement in Social Media Marketing
    ( 2017-01-04) Jaakonmäki, Roope ; Müller, Oliver ; vom Brocke, Jan
    Social media has become an important tool in establishing relationships between companies and customers. However, creating effective content for social media marketing campaigns is a challenge, as companies have difficulty understanding what drives user engagement. One approach to addressing this challenge is to use analytics on user-generated social media content to understand the relationship between content features and user engagement. In this paper we report on a quantitative study that applies machine learning algorithms to extract textual and visual content features from Instagram posts, along with creator- and context-related variables, and to statistically model their influence on user engagement. Our findings can guide marketing and social media professionals in creating engaging content that communicates more effectively with their audiences.
  • Item
    Improving Sentiment Analysis with Document-Level Semantic Relationships from Rhetoric Discourse Structures
    ( 2017-01-04) Märkle-Huß, Joscha ; Feuerriegel, Stefan ; Prendinger, Helmut
    Conventional sentiment analysis usually neglects semantic information between (sub-)clauses, as it merely implements so-called bag-of-words approaches, where the sentiment of individual words is aggregated independently of the document structure. Instead, we advance sentiment analysis by the use of rhetoric structure theory (RST), which provides a hierarchical representation of texts at document level. For this purpose, texts are split into elementary discourse units (EDU). These EDUs span a hierarchical structure in the form of a binary tree, where the branches are labeled according to their semantic discourse. Accordingly, this paper proposes a novel combination of weighting and grid search to aggregate sentiment scores from the RST tree, as well as feature engineering for machine learning. We apply our algorithms to the especially hard task of predicting stock returns subsequent to financial disclosures. As a result, machine learning improves the balanced accuracy by 8.6 percent compared to the baseline.
  • Item
    Data Integration and Predictive Analysis System for Disease Prophylaxis
    ( 2017-01-04) Erraguntla, Madhav ; Freeze, John ; Delen, Dursun ; Madanagopal, Karthic ; Mayer, Ric ; Khojasteh, Jam
    The goal of the Data Integration and Predictive Analysis System (IPAS) is to enable prediction, analysis, and response management for incidents of infectious diseases. IPAS collects and integrates comprehensive datasets of previous disease incidents and potential influencing factors to facilitate multivariate, predictive analytics of disease patterns, intensity, and timing. IPAS supports comprehensive epidemiological analysis - exploratory spatial and temporal correlation, hypothesis testing, prediction, and intervention analysis. Innovative machine learning and predictive analytical techniques like support vector machines (SVM), decision tree-based random forests, and boosting are used to predict the disease epidemic curves. Predictions are then displayed to stakeholders in a disease situation awareness interface, alongside disease incidents, syndromic and zoonotic details extracted from news sources and medical publications. Data on Influenza Like Illness (ILI) provided by CDC was used to validate the capability of IPAS system, with plans to expand to other illnesses in the future. This paper presents the ILI prediction modeling results as well as IPAS system features.
  • Item
    Consumer-Oriented Tech Mining: Integrating the Consumer Perspective into Organizational Technology Intelligence - The Case of Autonomous Driving
    ( 2017-01-04) Egger, Marc ; Schoder, Detlef
    To avoid missing technological opportunities and to counteract risks, organizations have to scan and monitor developments in the external environment through a structured process of technology intelligence. Previous approaches in tech mining—the application of text mining for technology intelligence —have primarily focused on the elicitation of technical or legal information from web, patent, or research databases. However, knowledge of consumers’ needs, fears, and hopes is a prerequisite for the success of an emerging technology in the marketplace. Thus, we claim that technology intelligence needs to also consider consumers’ technology perceptions. Hence, we propose a novel and comprehensive approach to collect user-generated content from the web and apply text mining to derive consumer perceptions. In doing so, we align with an established tech-mining process. This paper illustrates our approach on the emerging technology of autonomous driving and provides an initial indication of concurrent validity.
  • Item
    A Roadmap for Natural Language Processing Research in Information Systems
    ( 2017-01-04) Liu, Dapeng ; Li, Yan ; Thomas, Manoj A.
    Natural Language Processing (NLP) is now widely integrated into web and mobile applications, enabling natural interactions between human and computers. Although many NLP studies have been published, none have comprehensively reviewed or synthesized tasks most commonly addressed in NLP research. We conduct a thorough review of IS literature to assess the current state of NLP research, and identify 12 prototypical tasks that are widely researched. Our analysis of 238 articles in Information Systems (IS) journals between 2004 and 2015 shows an increasing trend in NLP research, especially since 2011. Based on our analysis, we propose a roadmap for NLP research, and detail how it may be useful to guide future NLP research in IS. In addition, we employ Association Rules (AR) mining for data analysis to investigate co-occurrence of prototypical tasks and discuss insights from the findings.
  • Item
    Introduction to Data, Text and Web Mining for Business Analytics Minitrack
    ( 2017-01-04) Delen, Dursun ; Eryarsoy, Enes ; Şeker, Şadi