Data, Text and Web Mining for Business Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 7 of 7
  • Item
    Text-based Causality Modeling with a Conceptual Label in a Hierarchical Topic Structure Using Bayesian Rose Trees
    ( 2021-01-05) Ogawa, Takuro ; Saga, Ryosuke
    This paper describes a method for constructing a causality model from review text data. Review text data include the evaluation factors of rating, and causality model extraction from text data is important for understanding the evaluation factors and their relationships. Several methods are available for extracting causality models by using a topic model. In particular, the method based on hierarchical latent Dirichlet allocation is useful for hierarchically comprehending causality structure. However, the depth of each topic in a hierarchical structure is forcefully pruned even if granularities differ for each topic. Thus, interpreting a hierarchical topic structure is difficult. To solve these problems, we construct a hierarchical topic structure with different depths by using Bayesian rose trees. Furthermore, we use conceptual labeling to add explicit semantics for each topic for interpretation. An experiment confirms that this model is accurate and interpretable using actual data.
  • Item
    Neural Machine Translation for Conditional Generation of Novel Procedures
    ( 2021-01-05) Geluykens, Joppe ; Mitrović, Sandra ; Ortega Vázquez, Carlos Eduardo ; Laino, Teodoro ; Vaucher, Alain ; De Weerdt, Jochen
    Procedural knowledge is generally dispersed across many experts within or across organizations which might lead to inefficiencies and redundancy. Historically, computers have been well suited to store procedural knowledge but they have lacked the capability to produce natural language text. Nonetheless, recent advances in machine learning permit a higher linguistic coherence which benefits applications with longer text outputs such as procedures. This work closes the gap between human experts and computers by proposing a framework for automatic, computer generation of procedures based on neural machine translation and the BART model. Furthermore, we define two benchmark problems for procedure generation and establish a set of evaluation metrics that can be used as a reference in further work. We demonstrate the potential of this solution on the task of generating cooking recipes based on available ingredients. The evaluation results on the Recipe1M dataset showcase the method's superiority over other, fairly novel, neural architectures.
  • Item
    Mining Logomaps for Ecosystem Intelligence
    ( 2021-01-05) Basole, Rahul
    Ecosystem intelligence is typically based on highly structured data. More recently, we have seen a growth in extracting knowledge from unstructured textual data sources. Yet, one form of unstructured data has largely been ignored in ecosystem intelligence: image-based data. With an increased use of images and graphics in corporate presentations, social media posts, and annual reports, there is a greater need and opportunity to mine this potentially trapped knowledge. We introduce and describe a human-assisted knowledge discovery approach applied to one particular type of image-based data, namely logomaps, combining image recognition, graph modeling, and visualization to provide insights into business ecosystems. We demonstrate the logomap mining method through a case study of the emerging artificial intelligence (AI) ecosystem and conclude with a discussion of implications and future work.
  • Item
    Business process analysis based on anomaly detection in event logs: a study on an incident management case
    ( 2021-01-05) Rojas Krugger, Esther Maria ; Maita, Ana Rocío Cárdenas ; Alves, Juliana Cristina Barbosa ; Fantinato, Marcelo ; Marques Peres, Sarajane
    Business processes allow anomalies to occur during execution. Anomaly detection aims to discover behaviors that are not typical or expected in the business process. In fact, early detection helps prevent intrusion and other risks in companies. There are several approaches that address this problem in process mining. This paper discusses anomaly detection approaches in business process discovery using a real-world event log from an ITIL-covered incident management process. We discuss benefits and limitations of using knowledge from process models discovered after treating anomalies.
  • Item
    A Taxonomy for Deep Learning in Natural Language Processing
    ( 2021-01-05) Landolt, Severin ; Wambsganss, Thiemo ; Söllner, Matthias
    Despite a large number of available techniques around Deep Learning in Natural Language Processing (NLP), no holistic framework exists which supports researchers and practitioners to organise knowledge when designing, comparing and evaluating NLP applications. This paper addresses this lack of a holistic framework by developing a taxonomy for Deep Learning in Natural Language Processing. Based on a systematic literature review as proposed by Webster and Watson and vom Brocke et al. and the iterative taxonomy development process of Nickerson et al. we derived five novel dimensions and 38 characteristics based on a sample of 205 papers. Our research suggests, that a Deep Learning NLP approach can be distinguished by five dimensions which were partly derived from the CRISP-DM methodology: application understanding, data preparation, modeling, learning technique and evaluation. We, therefore, hope to provide guidance and support for researchers and practitioners when using Deep Learning for NLP to design, compare and evaluate NLP applications.
  • Item
    ALGA: Automatic Logic Gate Annotator for Building Financial News Events Detectors
    ( 2021-01-05) Bainiaksinaite, Julija ; Kaplis, Dr Nikolaos ; Treleaven, Prof Philip
    We present a new automatic data labelling framework called ALGA - Automatic Logic Gate Annotator. The framework helps to create large amounts of annotated data for training domain-specific financial news events detection classifiers quicker. ALGA framework implements a rules-based approach to annotate a training dataset. This method has following advantages: 1) unlike traditional data labelling methods, it helps to filter relevant news articles from noise; 2) allows easier transferability to other domains and better interpretability of models trained on automatically labelled data. To create this framework, we focus on the U.S.-based companies that operate in the Apparel and Footwear industry. We show that event detection classifiers trained on the data generated by our framework can achieve state-of-the-art performance in the domain-specific financial events detection task. Besides, we create a domain-specific events synonyms dictionary.
  • Item
    Introduction to the Minitrack on Data, Text and Web Mining for Business Analytics
    ( 2021-01-05) Delen, Dursun ; Zolbanin, Hamed ; Davazdahemami, Behrooz