Trustworthy Artificial Intelligence and Machine Learning
Permanent URI for this collectionhttps://hdl.handle.net/10125/112567
Browse
Recent Submissions
Item type: Item , An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face(2026-01-06) Jia, Nan; Raja, Anita; Khatchadourian, RaffiAs machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministic and run-time-defined semantics of ML complicate traditional software refactoring. We define semantic preservation in LESS as the property that optimizations of intelligent components do not alter the system’s overall functional behavior. This paper introduces an empirical framework to evaluate semantic preservation in LESS by mining model evolution data from HuggingFace. We extract commit histories, Model Cards, and performance metrics from a large number of models. To establish baselines, we conducted case studies in three domains, tracing performance changes across versions. Our analysis demonstrates how semantic drift can be detected via evaluation metrics across commits and reveals common refactoring patterns based on commit message analysis. Although API constraints limited the possibility of estimating a full-scale threshold, our pipeline offers a foundation for defining community-accepted boundaries for semantic preservation. Our contributions include: (1) a large-scale dataset of ML model evolution, curated from 1.7 million Hugging Face entries via a reproducible pipeline using the native HF hub API, (2) a practical pipeline for the evaluation of semantic preservation for a subset of 536 models and 4000+ metrics and (3) empirical case studies illustrating semantic drift in practice. Together, these contributions advance the foundations for more maintainable and trustworthy ML systems.Item type: Item , MAEBE: Multi-Agent Emergent Behavior Framework(2026-01-06) Gothard, Tim; Erisken, Sinem; Leitgab, Martin; Potham, RamExplainability in evaluations of isolated large language models (LLMs) likely does not transfer to multi-agent AI ensembles (MAS), as MAS introduce novel emergent agent interaction and decision-making behaviors. To systematically assess differences in decision behaviors between isolated and ensemble agents, we present the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework. Using MAEBE with the Greatest Good Benchmark, a double-inversion question technique, and explainability analysis, we demonstrate that: (1) Robustness of decision preferences is substantially brittle in MAS LLM ensembles similarly as in isolated LLMs, as preferences shift significantly with changes to question framing. (2) Ensemble behavior is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing decision convergence, even when guided by a supervisor. Our findings underscore the value and necessity of evaluating explainability of multi-agent AI systems in their interactive context to properly assess results generated by MAS, with potential implications for AI safety and alignment.Item type: Item , Optimizing Class Distributions for Bias-Aware Multi-Class Learning(2026-01-06) Felske, Mirco; Stiene, StefanWe propose BiCDO (Bias-Controlled Class Distribution Optimizer), an iterative, data-centric framework that identifies Pareto-optimized class distributions for multi-class image classification. BiCDO enables performance prioritization for specific classes, which is useful in safety-critical scenarios (e.g. prioritizing 'Human' over 'Dog'). Unlike uniform distributions, BiCDO determines the optimal number of images per class to enhance reliability and minimize bias and variance in the objective function. BiCDO can be incorporated into existing training pipelines with minimal code changes and supports any labelled multi-class dataset. We have validated BiCDO using EfficientNet, ResNet and ConvNeXt on CIFAR-10 and iNaturalist21 datasets, demonstrating improved, balanced model performance through optimized data distribution.Item type: Item , Introduction to the Minitrack on Trustworthy Artificial Intelligence and Machine Learning(2026-01-06) Desantis, Derek; Pouchard, Line; Salhofer, Peter
