Data, Text, and Web Mining for Business Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 7 of 7
  • Item
    Deep Learning in Predicting Real Estate Property Prices: A Comparative Study
    ( 2023-01-03) Shi, Donghui ; Zhang , Hui ; Guan, Jian ; Zurada, Jozef ; Chen, Zejun ; Li, Xiyang
    The dominant methods for real estate property price prediction or valuation are multi-regression based. Regression-based methods are, however, imperfect because they suffer from issues such as multicollinearity and heteroscedasticity. Recent years have witnessed the use of machine learning methods but the results are mixed. This paper introduces the application of a new approach using deep learning models to real estate property price prediction. The paper uses a deep learning approach for modeling to improve the accuracy of real estate property price prediction with data representing sales transactions in a large metropolitan area. Three deep learning models, LSTM, GRU and Transformer, are created and compared with other machine learning and traditional models. The results obtained for the data set with all features clearly show that the RF and Transformer models outperformed the other models. LSTM and GRU models produced the worst results, suggesting that they are perhaps not suitable to predict the real estate price. Furthermore, the implementations of Transformer and RF on a data set with feature reduction produced even more accurate prediction results. In conclusion, our research shows that the performance of the Transformer model is close to the RF model. Both models produce significantly better prediction results than existing approaches in terms of accuracy.
  • Item
    Integration of Computer Vision with Analogical Reasoning for Characterizing Unknowns
    ( 2023-01-03) Combs, Kara ; Bihl, Trevor ; Ganapathy, Subhashini
    Current state-of-the-art artificial intelligence struggles with accurate interpretation of out-of-library (OOL) objects. One method proposed remedy is analogical reasoning (AR), which utilizes abductive reasoning to draw inferences on an unfamiliar scenario given knowledge about a similar familiar scenario. Currently, applications of visual AR gravitate toward analogy-formatted image problems rather than to computer vision data sets. The Image Recognition Through Analogical Reasoning Algorithm (IRTARA) approach described herein shows how AR can be leveraged to improve computer vision in OOL situations. IRTARA produces a word-based term frequency list that characterizes the OOL object of interest. To evaluate the quality of the results of IRTARA, both quantitative and qualitative assessments are used, including a baseline to compare the automated methods with human-generated results. Fifteen OOL objects were tested using IRTARA, which showed consistent results across all three evaluation methods on the objects that performed exceptionally well or poorly overall.
  • Item
    A Practical and Empirical Comparison of Three Topic Modeling Methods Using a COVID-19 Corpus: LSA, LDA, and Top2Vec
    ( 2023-01-03) Zengul, Ferhat ; Bulut, Aysegul ; Oner, Nurettin ; Ahmed, Abdulaziz ; Yadav, Manju ; Gray, Hope Gracie ; Ozaydin, Bunyamin
    This study was prepared as a practical guide for researchers interested in using topic modeling methodologies. This study is specially designed for those with difficulty determining which methodology to use. Many topic modeling methods have been developed since the 1980s namely, latent semantic indexing or analysis (LSI/LSA), probabilistic LSI/LSA (pLSI/pLSA), naïve Bayes, the Author-Recipient-Topic (ART), Latent Dirichlet Allocation (LDA), Topic Over Time (TOT), Dynamic Topic Models (DTM), Word2Vec, Top2Vec, and \variation and combination of these techniques. Researchers from disciplines other than computer science may find it challenging to select a topic modeling methodology. We compared a recently developed topic modeling algorithm Top2Vec with two of the most conventional and frequently-used methodologiesLSA and LDA. As a study sample, we used a corpus of 65,292 COVID-19-focused abstracts. Among the 11 topics we identified in each methodology, we found high levels of correlation between LDA and Top2Vec results, followed by LSA and LDA and Top2Vec and LSA. We also provided information on computational resources we used to perform the analyses and provided practical guidelines and recommendations for researchers.
  • Item
    Your Sentiment Matters: A Machine Learning Approach for Predicting Regime Changes in the Cryptocurrency Market
    ( 2023-01-03) Parra-Moyano, José ; Partida, Daniel ; Gessl, Moritz
    Research suggests that a significant number of those investing in cryptocurrencies do not follow what we might call rational, profit-maximizing behavior. We also know that with the progressive lowering of entry barriers to online trading platforms, an increasing number of inexperienced investors are investing in cryptocurrencies. Increasingly, the behavior of investors contradicts the predictions made by traditional financial models and challenges the assumptions on which such models have previously relied when anticipating returns on cryptocurrency investments. To overcome this issue we develop a random forest model which we train with features stemming from a sentiment analysis performed on data generated by cryptocurrency enthusiasts using Twitter, Google Trends, and Reddit. Our findings show that such features have an important role to play in capturing the behavior of cryptocurrency investors and increase our model’s ability to anticipate regime changes in the cryptocurrency market. Our model outperforms the predictive ability of the Log-Periodic Power Law model—currently, the model most widely-used to predict regime changes in financial markets. These results imply that scholars and practitioners aiming to understand and predict the development of cryptocurrency markets stand to benefit from analyzing social media data generated by cryptocurrency enthusiasts.
  • Item
    Introduction to the Minitrack on Data, Text, and Web Mining for Business Analytics
    ( 2023-01-03) Delen, Dursun ; Davazdahemami, Behrooz
  • Item
    Multi-Domain Named Entity Recognition for Robotic Process Automation
    ( 2023-01-03) Ganzha, Maria ; Denisiuk, Aleksander ; Sowiński, Piotr ; Wasielewska-Michniewska, Katarzyna ; Paprzycki, Marcin
    To make Robotic Process Automation more attractive, it needs to become more ``intelligent''. In this context, a modification of the Form-to-Rule approach, based on identifying data types of form fields, is proposed. Moreover, multi-domain named entity recognition is used, for field value identification. These techniques, used jointly, allow software robots to adapt to interface changes. Experimental results are reported and verify viability of the proposed approach.
  • Item
    Detecting Feature Requests of Third-Party Developers through Machine Learning: A Case Study of the SAP Community
    ( 2023-01-03) Kauschinger, Martin ; Vieth, Niklas ; Schreieck, Maximilian ; Krcmar, Helmut
    The elicitation of requirements is central for the development of successful software products. While traditional requirement elicitation techniques such as user interviews are highly labor-intensive, data-driven elicitation techniques promise enhanced scalability through the exploitation of new data sources like app store reviews or social media posts. For enterprise software vendors, requirements elicitation remains challenging because app store reviews are scarce and vendors have no direct access to users. Against this background, we investigate whether enterprise software vendors can elicit requirements from their sponsored developer communities through data-driven techniques. Following the design science methodology, we collected data from the SAP Community and developed a supervised machine learning classifier, which automatically detects feature requests of third-party developers. Based on a manually labeled data set of 1,500 questions, our classifier reached a high accuracy of 0.819. Our findings reveal that supervised machine learning models are an effective means for the identification of feature requests.