Data, Text and Web Mining for Business Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 5
  • Item
    Quantitative Considerations about the Semantic Relationship of Entities in a Document Corpus
    ( 2018-01-03) Schmidt, Andreas ; Scholz, Steffen
    Providing suggestions for internet-users is an important task nowadays. So for example, when we enter a search string into the Google interface, it suggests further terms, based on previously formulated queries from other users having used the search engine before. In the context of an entity based search engine, entity-suggestion is also a very important task, when specifying the entities by the user. Additionally, this feature can also be utilized to suggest further entities, which are somehow related to already specified entities. If the suggestions are eligible the user can very quickly formulate his search desire. If the suggestions are based on the search corpus itself, new and previously unknown relationships between entities can be discovered along the way. The aim of this paper is a quantitative analysis of relationships between entities in a big document corpus under the aspect of providing suggestions for entities in real time.
  • Item
    Customer Lifetime Value Prediction in Non-Contractual Freemium Settings: Chasing High-Value Users Using Deep Neural Networks and SMOTE
    ( 2018-01-03) Sifa, Rafet ; Runge, Julian ; Bauckhage, Christian ; Klapper, Daniel
    In non-contractual freemium and sharing economy settings, a small share of users often drives the largest part of revenue for firms and co-finances the free provision of the product or service to a large number of users. Successfully retaining and upselling such high-value users can be crucial to firms' survival. Predictions of customers' Lifetime Value (LTV) are a much used tool to identify high-value users and inform marketing initiatives. This paper frames the related prediction problem and applies a number of common machine learning methods for the prediction of individual-level LTV. As only a small subset of users ever makes a purchase, data are highly imbalanced. The study therefore combines said methods with synthetic minority oversampling (SMOTE) in an attempt to achieve better prediction performance. Results indicate that data augmentation with SMOTE improves prediction performance for premium and high-value users, especially when used in combination with deep neural networks.
  • Item
    Data Integration and Predictive Analysis System for Disease Prophylaxis: Incorporating Dengue Fever Forecasts
    ( 2018-01-03) Freeze, John ; Erraguntla, Madhav ; Verma, Akshans
    The goal of the Data Integration and Predictive Analysis System (IPAS) is to enable prediction, analysis, and response management for incidents of infectious diseases. IPAS collects and integrates comprehensive datasets of previous disease incidents and potential influencing factors to facilitate multivariate, predictive analysis of disease patterns, intensity, and timing. We have used the IPAS technology to generate successful forecasts for Influenza Like Illness (ILI). In this study, IPAS was expanded to forecast Dengue fever in the cities of San Juan, Puerto Rico and Iquitos, Peru. Data provided by the National Oceanic and Atmospheric Administration (NOAA) was processed and used to generate prediction models. Predictions were developed with modern machine learning algorithms, identifying the one-week and four-week forecast of Dengue incidences in each city. Prediction model results are presented along with the features of the IPAS system.
  • Item
    Understanding Topic Models in Context: A Mixed-Methods Approach to the Meaningful Analysis of Large Document Collections
    ( 2018-01-03) Eickhoff, Matthias ; Wieneke, Runhild
    In recent years, we have witnessed an unprecedented proliferation of large document collections. This development has spawned the need for appropriate analytical means. In particular, to seize the thematic composition of large document collections, researchers increasingly draw on quantitative topic models. Among their most prominent representatives is the Latent Dirichlet Allocation (LDA). Yet, these models have significant drawbacks, e.g. the generated topics lack context and thus meaningfulness. Prior research has rarely addressed this limitation through the lens of mixed-methods research. We position our paper towards this gap by proposing a structured mixed-methods approach to the meaningful analysis of large document collections. Particularly, we draw on qualitative coding and quantitative hierarchical clustering to validate and enhance topic models through re-contextualization. To illustrate the proposed approach, we conduct a case study of the thematic composition of the AIS Senior Scholars' Basket of Journals.
  • Item
    Introduction to the Minitrack on Data, Text and Web Mining for Business Analytics
    ( 2018-01-03) Delen, Dursun ; Zolbanin, Hamed Majidi