Data Science and Machine Learning to Support Business Decisions

Permanent URI for this collection

https://hdl.handle.net/10125/109864

Browse

Now showing 1 - 9 of 9

Archetype Discovery from Taxonomies: A Method to Cluster Small Datasets of Categorical Data
(2025-01-07) Lenssen, Lars; Stahmann, Philip; Janiesch, Christian; Schubert, Erich
This study investigates the challenges of clustering small categorical datasets, particularly in the context of taxonomy-based archetype formation. Taxonomies, such as the Linnaean system, are vital for organizing knowledge across diverse domains and can be used as code books. Archetypes then represent common patterns across the entities. While cluster analysis is a powerful tool for uncovering unknown patterns, traditional clustering methods are predominantly distance-based and optimized for continuous data, which is inadequate for categorical data where similarity is not easily quantifiable. Common distance measures, like Euclidean and Manhattan distances, fail to capture meaningful relationships in categorical datasets. This work addresses this gap by exploring information-theoretic approaches to develop a novel clustering method CatRED tailored for small categorical datasets such as taxonomy data. We evaluate our method through its application to two taxonomy datasets, demonstrating its effectiveness in generating archetypes.
Vehicle Type Recognition Based on Audio Data
(2025-01-07) Kobiela, Dariusz; Hajdasz, Michał; Erezman, Mateusz; Nurzyńska, Karolina; Zaporowski, Szymon; Kurowski, Adam; Weichbroth, Paweł
Identifying different vehicle types can help manage traffic more efficiently, reduce congestion, and improve public safety. This study aims to create a classification model that can recognize vehicle types based on the sound of passing vehicles. To achieve this, a database of raw audio files containing 1763 samples from two sources was assembled. The time-domain signals were converted to a time-frequency representation using the short-time Fourier transform to generate Mel Spectrograms. Mel-frequency Cepstral Coefficients (MFCCs) were also generated using the discrete cosine transform. In our experiments we compared these approaches. Since the data was imbalanced we applied online augmentation. Based on the literature review, we chose a Convolutional Neural Network (CNN) classifier because it is particularly well suited for analyzing large datasets due to its automatic feature extraction, parameter sharing and sparsity. The results showed that Mel Spectrograms were more effective for audio data preprocessing in this particular use case, achieving the highest accuracy of 0.875 and the highest f1-score of 0.877 compared to MFCCs.
Inside the Driver’s Mind: Mapping Customers’ Usage Behavior of Advanced Driving Assistance Systems
(2025-01-07) Micus, Christian; Ramalingam, Gouthaman; Steiner, Benedikt; Böttcher, Timo
This study addresses the critical gap in understanding advanced driving assistance systems (ADAS) usage by exploring driving behavior profiles. We use a Bayesian Gaussian Mixture Model to analyze a substantial dataset of 232,849 drives from 55,864 vehicles in January 2022. Our results unveil six distinct vehicle usage profiles, shedding light on the correlations between individual ADAS usage and influencing factors such as customer driving behavior, mobility patterns, and the environmental and vehicle context. Furthermore, we map these profiles to Roger’s innovation adaptation groups, providing valuable insights into adoption dynamics. The findings contribute to classifying diverse car usage profiles, enabling the identification and prioritization of cluster-specific needs. Theoretical implications arise from the nuanced understanding of ADAS usage changes, contributing to the theoretical understanding of user adoption behaviors and preferences. These insights present new opportunities for personalizing functions and interactions to meet the evolving needs of distinct driver profiles.
LLM-based Textualization from Illustrated Path Diagram
(2025-01-07) Saga, Ryosuke; Liu, Songyi
Structural Equation Modeling (SEM) is widely used for causal analysis, but path diagrams generated by Structural Equation Modeling are difficult to understand for those who do not have knowledge about SEM. Therefore, by describing path diagrams as the result of Structural Equation Modeling in writing, the people who does not have relevant knowledge can obtain useful information from path diagrams. We propose a method to convert path diagrams into descriptive text based on Large Language Model.
Beyond the Bell: Leveraging Off-market Data for AI-enabled Stock Directionality Forecast
(2025-01-07) Mukherjee, Himadri; Das, Suchismita; Bose, Kaustav; Kumar, Kundan; Paul, Souren; Ghosh, Alo
Stock directionality forecasts are extremely useful in the financial market aiding in more informed trading decisions. However, it is difficult due to the highly volatile nature of the stock market. The majority of the stock trading takes place during the regular market hours whose data is mostly used for forecasts. Trades are also executed before the market opens (pre-market) and after the market closes (post-market). This off-market trading data is often ignored due to its minute trading volume. Exploration of this data for stock market forecasting is in its nascent state. We forecast the directionality of the end-of-the-day price using this off-market along with regular market hour data. The proposed AI-enabled framework extracts useful features from the off-market data, and 15 technical indicators based on regular market data followed by a tree-based prediction approach. The obtained results show performance improvements of over 7% in closing price directionality forecast when the off-market hour-based features are incorporated.
Enhancing Remaining Time Prediction in Business Processes through Graph Embedding
(2025-01-07) Rodrigues Neubauer, Thais; Peeperkorn, Jari; De Weerdt, Jochen; Fantinato, Marcelo; Marques Peres, Sarajane
Accurately predicting the remaining time of business processes is essential for operational efficiency but remains challenging due to the complex interdependencies among process activities. Traditional approaches often fail to capture these complexities effectively. This paper introduces an approach to improving remaining time prediction through the application of graph embedding to enrich the representation of process activities. The proposed approach enriches the data representation for model training that is agnostic to the prediction algorithm. We detail the graph design and explore embedding parameters, applying them to real-world event logs. Our experimental study demonstrates that our approach can reduce percentual prediction error rates by up to 35% compared to traditional methods, showing the effectiveness of graph embeddings in improving predictive accuracy in complex business environments.
A Framework for Explainable Root Cause Analysis in Manufacturing Systems – Combining Machine Learning, Explainable Artificial Intelligence and the Ishikawa Model for Industrial Manufacturing
(2025-01-07) Kiefer, Daniel; Straub, Tim; Bitsch, Günter; Van Dinther, Clemens
This paper proposes a novel framework – “Transparent Reasoning in Artificial intelligence Cause Explanation” (TRACE) – that combines root cause analysis, explainable artificial intelligence, and machine learning in an understandable way for the worker. The goal is to enhance transparency, interpretability, and explainability in AI-driven decision-making processes as well as to increase the acceptance of AI within an industrial manufacturing area. The paper outlines the need of such a framework, describes the design process, and shows a preliminary mockup, a possible underlying software architecture as well as an evaluation and integration plan in an industrial environment.
Towards Operational Excellence in Data Science: Designing a Process Guidance System to Support Data Science Process Execution
(2025-01-07) Rösl, Stefan; Auer, Thomas; Schieder, Christian
As data science (DS) becomes integral to business strategies, standardizing DS processes and improving their execution are becoming increasingly important. To address this, researchers have proposed several data science process models (DSPM). Despite their recognized efficiency gains, organizations are reluctant to adopt these models. A major challenge to the adoption of DSPM is the need for more process guidance. Our study introduces proDASC, a process guidance system (PGS) designed to support the execution of DS processes to facilitate the adoption of DSPM and promote its practical application. We employ a design science research approach to investigate DSPM implementation issues, derive design decisions from existing design principles, and develop and evaluate the innovative artifact proDASC. Our research presents a methodologically grounded PGS prototype that has the potential to improve the execution of the DS processes and enhance process knowledge. It supports the adoption of DSPM and expands the PGS knowledge base.
Introduction to the Minitrack on Data Science and Machine Learning to Support Business Decisions
(2025-01-07) Davazdahemami, Behrooz; Zolbanin, Hamed; Delen, Dursun

Browse

Recent Submissions