Data, Text, and Web Mining for Business Analytics

Permanent URI for this collection

https://hdl.handle.net/10125/60547

Browse

Now showing 1 - 10 of 10

Mining User-Generated Repair Instructions from Automotive Web Communities
(2019-01-08) Wambsganss, Thiemo; Fromm, Hansjörg
The objective of this research was to automatically extract user-generated repair instructions from large amounts of web data. An artifact has been created that classifies a web post as containing a repair instruction or not. Methods from Natural Language Processing are used to transform the unstructured textual information from a web post into a set of numerical features that can be further processed by different Machine Learning Algorithms. The main contribution of this research lies in the design and prototypical implementation of these features. The evaluation shows that the created artifact can accurately distinguish posts containing repair instructions from other posts e.g. containing problem reports. With such a solution, a company can save a lot of time and money that was previously necessary to perform this classification task manually.
A Context Free Gramma for Key Noun-Phrase Extraction from Text
(2019-01-08) Liu, Ying
Data Stream Models for Predicting Adverse Events in a War Theater
(2019-01-08) Shi, Donghui; Zurada, Jozef; Karwowski, Waldemar; Guan, Jian
Predicting adverse events in a war theater has been an active area of research. Recent studies used machine learning methods to predict adverse events utilizing infrastructure development spending data as input variables. The goals of these studies were to find correlation and disclose the main factors between adverse events and human-social-infrastructure development projects, and reduce the occurrence of the adverse events. The predictions still have large errors compared with the real values using the existing methods. The reason could be that some significant variables are removed to comply with constraints in a soft computing model such as neural networks, fuzzy inference systems (FIS) and adaptive neuro-fuzzy inference systems (ANFIS) that work well with a smaller number of variables. In this paper, a data stream approach using three data stream regression algorithms, AMRules, TargetMean and FIMTDD, is proposed to predict the adverse events so that much more input variables could be included. The results show that the data stream methods generate better results than machine learning methods used in the previous studies, thus helping us better understand the relationship between infrastructure development and adverse events. In addition the data stream methods also outperform the traditional linear regression model. An important advantage in using data stream methods is the ability to create and apply predictive models with a relatively small amount of memory and time. Finally, the use of data stream methods provides an additional advantage by allowing the user to observe error distribution over time for more accurate assessment of the performance of the resulting models.
Cross-Cultural Examination on Content Bias and Helpfulness of Online Reviews: Sentiment Balance at the Aspect Level for a Subjective Good
(2019-01-08) Nakayama, Makoto; Wan, Yun
Online reviews can be fraught with biases, especially on experience goods. Using multilingual sentiment analysis software, we examined the characteristics of review biases and helpfulness at the aspect level across two different cultures. First, we found the lopsidedness of emotions expressed over the four key aspects of Japanese restaurant reviews between Japanese and Western consumers. Second, helpful reviews have sentiments expressed more evenly over those aspects than average for both Japanese and Western consumers. Third, however, there are significant differences over how sentiments are spread over aspects between them. Westerners found reviews helpful when reviews focused less on food and more on service. In addition, Japanese customers were more concerned with savings whereas Westerners paid attention to whether they are getting their money’s worth. These findings point to future research opportunities for leveraging sentiment analysis over key aspects of goods, particularly those of experience/subjective goods, across different cultures and customer profile categories.
Leveraging Indexical Pragmatics (OFIP) for Search Engine: An Ontology- based Approach
(2019-01-08) Liu, Dapeng; Yoon, Victoria Y
The relevance of search results is an important indicator of information retrieval performance. A domain-specific Search Engine (SE), distinct from a general web SE, focuses on a specific segment of online content and may increase search results relevance. Traditional methods to improve domain-specific SE precision heavily depend on query expansion, lexical analysis of texts, and large amounts of training data. These methods suffer from limited effectiveness and efficiency because expanded query terms and coarse language features bring in uncontrollable complexity and increase dimensionality. Our design, leveraging the integrated power of computational syntax, semantics, and indexical pragmatics, proposes an ontology-driven framework that is tailored to work in a dynamic Internet environment without large amounts of manually annotated training data. This article presents our design, that is essential for building a domain-specific SE, and its instantiation in the terrorism domain.
Unsupervised Ranking of Numerical Observations based on Magnetic Properties and Correlation Coefficient
(2019-01-08) Alattas, Khalid; Islam, Aminul; Kumar, Ashok; Bayoumi, Magdy
This paper addresses a novel unsupervised algorithm to rank numerical observations which is important in many applications in computer science, especially in information retrieval (IR). The proposed algorithm shows how correlation coefficients between attribute values and the concept of magnetic properties can be explored to rank multi-attribute numerical objects. One of the main reasons of using correlation coefficients between attribute values and the concept of magnetic properties is that they are easy to compute and interpret. Our proposed Unsupervised Ranking using Magnetic properties and Correlation coefficient (URMC) algorithm can use some or all the numerical attributes of objects and can also handle objects with missing attribute values. The proposed algorithm overcomes a major limitation of the state-of-the-art technique while achieving excellent results.
CDMF: A Deep Learning Model based on Convolutional and Dense-layer Matrix Factorization for Context-Aware Recommendation
(2019-01-08) Gan, Mingxin; Ma, Yingxue; Xiao, Kejun
We proposes a novel deep neural network based recommendation model named Convolutional and Dense-layer Matrix Factorization (CDMF) for Context-aware recommendation, which is to combine multi-source information from item description and tag information. CDMF adopts a convolution neural network to extract hidden feature from item description as document and then fuses it with tag information via a full connection layer, thus generates a comprehensive feature vector. Based on the matrix factorization method, CDMF makes rating prediction based on the fused information of both users and items. Experiments on a real dataset show that the proposed deep learning model obviously outperforms the state-of-art recommendation methods.
Sentence-Level Sentiment Analysis of Financial News Using Distributed Text Representations and Multi-Instance Learning
(2019-01-08) Lutz, Bernhard; Pröllochs, Nicolas; Neumann, Dirk
Predicting the Outcome of a Football Game: A Comparative Analysis of Single and Ensemble Analytics Methods
(2019-01-08) Eryarsoy, Enes; Delen, Dursun
As analytical tools and techniques advance, increasingly large numbers of researchers apply these techniques on a variety of different sports. With nearly 4 billion followers, it is estimated that association football, or soccer, is the most popular sports for fans across the world by a large margin. The objective of this study is to develop a model to predict the outcomes of soccer (or association football) games (win-loss-draw), and determine factors that influence game outcomes. We used 10 years of comprehensive game-level data spanning the years 2007-2017 in the Turkish Super League, and tested a variety of classifiers to identify the most promising methods for outcome predictions.
Introduction to the Minitrack on Data, Text, and Web Mining for Business Analytics
(2019-01-08) Delen, Dursun; Zolbanin, Hamed Majidi

Browse

Recent Submissions