Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences
Permanent URI for this collectionhttps://hdl.handle.net/10125/112436
Browse
Recent Submissions
Item type: Item , Trajectory-Aware Topic Mining: A Domain-Adaptive and Geometry-Preserving Framework for Identifying Promising Technologies(2026-01-06) Jo, Seokyeon; Kwon, Kyungmin; Lee, HanjunThis study presents a trajectory-aware topic mining framework that addresses key limitations of existing trend analysis methods, including semantic information loss and parameter sensitivity. Leveraging domain-adaptive embeddings and geometry-preserving clustering, the framework preserves high-dimensional semantic structures while automatically determining optimal topic clusters. Self-attention mechanisms and cosine similarity enable accurate tracking of topic evolution over time. Additionally, dynamic metrics, velocity and directional consistency, quantify topic momentum and stability, allowing early identification of emerging or declining research themes. Applied to 206,536 computer vision publications (2012–2024), the framework effectively reveals rapidly evolving and stable subfields. The proposed approach offers actionable insights for researchers, industry practitioners, and policymakers to inform strategic R\&D investment and innovation planning.Item type: Item , Evaluation of Information Extraction Algorithms for Preserving Analogical Semantics within Knowledge Graphs(2026-01-06) Combs, Kara; Champagne, Lance; Lemming, Grace; Bihl, TrevorAnalogical reasoning is a promising, lightweight solution to inferencing on novel data without prior training. However, algorithms with this methodology have historically relied on strict, human-defined schemas. To address this issue, this work proposes using information extraction algorithms to transform textual analogies into knowledge graphs (KGs) for a more machine-friendly format. We compare the knowledge graphs created by four relation extractors, three co-reference resolvers, and two embedding models via Pearson’s $r$ coefficient, root mean squared error (RMSE), and the Wilcoxon Signed-rank Test. We observe that the Wilcoxon Signed-rank Test provided the most streamlined result for algorithm and embedding selection, and that extractors were more influential than the resolver in creating KGs. From this, OpenIE was the best-performing extractor, and the SBERT embeddings yielded the KGs that best preserved analogical structure. Future work should focus on additional statistical tests and a greater range of information extraction algorithms and embeddings.Item type: Item , A Domain-Adaptive Soft Prompting Framework for Multi-Type Bias Detection in News(2026-01-06) Zhang, Chengjun; Ampel, Benjamin; Samtani, SagarAdvances in Large Language Models (LLMs) have enabled new opportunities to automate media analysis and improve collaborative social cybersecurity. A key task is bias detection in news reporting, which is essential for promoting information fairness and reducing polarization. However, existing approaches often rely on supervised fine-tuning with labeled datasets and fail to capture domain-specific linguistic patterns, limiting scalability and generalization. To address this, we propose a lightweight, modular framework that combines domain-adaptive pretraining (DAP) with Masked Language Modeling (MLM) and soft prompt tuning to detect six types of media bias (framing, group, semantic properties, connotation, informational spin, and phrasing). Our framework leverages 401,000+ New York Times articles from 2000 to 2024 to pretrain five LLMs, followed by bias prompting with small labeled data. The approach improves F1 by 7.6% and precision by 6.8% over hard prompts on average across the six types of biases. These results confirm DAP with soft prompts as an efficient and scalable solution for bias-aware NLP in resource-constrained environmentsItem type: Item , AutoTheme: A Multi-Agent Framework for Inductive Thematic Analysis with LLMs(2026-01-06) Wahbeh, Abdullah; El-Gayar, Omar; Al-Ramahi, Mohammad; Nasralah, Tareq; Elnoshokaty, AhmedThematic analysis is a qualitative research method used to identify and interpret patterns in textual data. However, it can be time-consuming and challenging to replicate. While recent advancements in large language models (LLMs) and generative AI have enhanced thematic analysis, existing methods often rely on prompt-based interactions and require significant human intervention. This paper introduces an agentic AI framework comprising autonomous, goal-directed agents powered by LLMs to perform inductive thematic analysis with minimal human input. We evaluate the framework using a dataset from a Cognitive Behavioral Therapy (CBT) mobile app and compare the results with those from Latent Dirichlet Allocation (LDA), demonstrating improved efficiency, adaptability, and thematic depth. Overall, the approach has shown efficacy, especially for short texts, such as app reviews and social media posts.Item type: Item , Beyond RAG: A LLM-Based FAQ Matching Framework for Real-Time Decision Support in Contact Centers(2026-01-06) Agrawal, Garima; Gummuluri, Sashank; Spera, CosimoIn customer contact centers, human agents often face long average handling times (AHT) due to the need to manually interpret queries and search large knowledge bases (KBs). While retrieval-augmented generation (RAG) systems using large language models (LLMs) are increasingly adopted to support these tasks, they face limitations in real-time conversations—particularly with poorly formulated queries and repeated retrieval of frequently asked questions (FAQs). To address these issues, we propose a decision support framework that extends beyond RAG by combining real-time question identification with a dual-threaded FAQ matching and generation system. If the query matches a FAQ, the answer is retrieved instantly; otherwise a well-formed query is generated and routed to a RAG model. Deployed within Minerva CQ’s human-agent assist platform, our solution delivers sub-2-second responses for matched queries, significantly reduces unnecessary RAG calls, and lowers operational costs. We also introduce an automated, LLM-agentic pipeline for mining FAQs from historical transcripts, enabling continuous improvement of the FAQ knowledge base in the absence of manually curated QA pairs.Item type: Item , Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System(2026-01-06) Akben, Mustafa; Satko, AaronLarge language models (LLMs) offer promising opportunities for organizational research. However, their built-in moderation systems can create problems when researchers try to analyze harmful content, often refusing to follow certain instructions or producing overly cautious responses that undermine validity of the results. This is particularly problematic when analyzing organizational conflicts such as microaggressions or hate speech. This paper introduces an Elo rating-based method that significantly improves LLM performance for harmful content analysis In two datasets, one focused on microaggression detection and the other on hate speech, we find that our method outperforms traditional LLM prompting techniques and conventional machine learning models on key measures such as accuracy, precision, and F1 scores. Advantages include better reliability when analyzing harmful content, fewer false positives, and greater scalability for large-scale datasets. This approach supports organizational applications, including detecting workplace harassment, assessing toxic communication, and fostering safer and more inclusive work environments.Item type: Item , Large Language Models as Games(2026-01-06) Bihl, TrevorIf computers are dramatic experiences, Large Language Models (LLMs) are games. Consumer AI once served mainly as a feature within larger products, but LLMs and generative tools have created a market for AI as a service. Their conversational design fosters engagement as both tools and entertainment. Prompting has become an art, with users directing LLMs like actors in a co-created performance. This paper identifies five generative interaction types: collaboration, exploration, adversarial, narrative, and roleplaying. From these, four gameplay modes emerge: interactive prompting, playing games with LLMs, using LLMs as in-game agents, and generating games via LLMs. These modes reflect principles of game design—challenge, reward, strategy—embedded in LLM interactions. Whether used for productivity or play, prompting creates a feedback-rich, exploratory experience. This paper argues that LLM use inherently constitutes a form of generative play grounded in interactive storytelling.Item type: Item , Detection of Contradictions and Inconsistencies in Regulatory Documents using Prompt-Engineering(2026-01-06) Schumann, Gerrit; Marx Gómez, Jorge; Pargmann, HergenCompanies create regulatory documents, such as policies, standards, and guidelines, to define their processes and structures. Frequent updates to these documents can lead to inconsistencies and contradictions between the respective regulations, which can result in errors, delays, asset compromise, or enabling fraud and non-compliance. Given the variety of document types and their thematic, structural, lexical, syntactic, and domain-specific differences, automated conflict detection remains a challenge, especially due to the lack of annotated data from practice. As an alternative to supervised approaches, this paper investigates whether a prompt-based classifier can detect contradictions and inconsistencies between regulatory texts and what level of accuracy can be achieved. The evaluation of three prompt variants and seven large language models on a real-world regulatory dataset shows that the detection accuracy of a prompt-based classifier (F1- score of 0.851), which includes 26 detailed formulated rules, is only 1.29% lower than that of a supervised model (F1-score of 0.862) trained with annotated data.Item type: Item , Introduction to the Minitrack on Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences(2026-01-06) Wu, Winston; Ranly, Neil; Langhals, Brent; Wagner, TorreyItem type: Item , Evaluating Summarization Quality of Locally Hosted 3-Billion Parameter Large Language Models(2026-01-06) Tyndall, Erick; Wagner, Torrey; Ranly, Neil; Wu, Winston; Langhals, BrentThis study evaluates summarization performance of 3-billion parameter large language models that can run locally on consumer-grade hardware. Using a corpus of 1,000 articles from the XSum dataset, models from the LLaMA, Phi, and Qwen families generated single-sentence summaries with a unified zero-shot prompt. A total of 4,000 summaries, consisting of model-generated outputs and human-authored references, were analyzed using 74 extracted features capturing linguistic abstractiveness, extractiveness, and informativeness. From these, 41 metrics were selected for nonparametric statistical comparison to evaluate model performance relative to human-written summaries. Results show that Phi and LLaMA frequently outperformed the human baseline in informativeness and extractiveness, while struggling with abstraction. Qwen performed well in content retention but was less consistent overall. These findings suggest small-scale models can achieve near-human summarization quality across several metrics but still struggle with abstraction. The study underscores their promise for privacy-centric, resource-limited environments and the importance of transparent, multidimensional evaluation.
