Artifical Intelligence Security: Ensuring Safety, Trustworthiness, and Responsibility in AI Systems

Permanent URI for this collection

https://hdl.handle.net/10125/110007

Browse

Now showing 1 - 6 of 6

Managing Trustworthiness in Advanced Autonomous Systems
(2025-01-07) Sarathy, Sriprakash; Chin, Shiu-Kai; Young, William; Wolf, Marilyn; Oh, J-C; Qiu, Qinru; Marquez, Marlon; Angiolini, R.; Mazzacane, Anthony
Cyber-physical systems that are employed as part of a decision-making framework require a measure of behavior assurance to characterize their function. Further, such an assurance measure needs to accommodate both safety and security considerations in the design and implementation of such components. A key measure relevant to the emergence of data-driven and learning-enabled components is that of trustworthiness. Trustworthiness of a system is of paramount importance for all safety and security relevant systems, but of particular significance for autonomous systems operating without a human pilot or operator in the control-feedback loop. Further, emerging technologies such as data-driven ML and AI-based approaches that represent quantum jumps in tactical mission capabilities, demand a measure of confidence prior to their integration into existing decision frameworks. While we may be able to characterize an AI system’s capabilities by observing its behavior, we cannot understand its limitations and potential negative capabilities not obvious in the training data, opening up the possibility of unintended behaviors and thus a degree of unpredictability. We present a systems engineering methodology based on mission engineering coupled with system theoretic analysis, which provides a formal representation of system level security properties that can be mapped to lower level subsystem specifications and verified using conventional approaches. We address some of the underlying computing platform requirements necessary to ensure the successful implementation of safe, secure, trustworthy and resilient autonomous operations. This paper focuses on factors that impact trustworthiness, methods to estimate and assess it, design approaches that can integrate and support enhanced trustworthiness, as well as methods to verify and validate trustworthiness-relevant system requirements. Some relevant examples are provided in the context of an autonomous aircraft system and its relevant subsystems.
Domain Anchorage in GPT-4: A Computational Linguistic Analysis of Lexicographic Profiling and Its Implications for Unintended Information Dissemination
(2025-01-07) Challappa, Lekha; Zhang, Zijin; Garg, Rajiv
Our study expands upon recent work explaining in-context learning as implicit Bayesian inference, where language models infer shared latent concepts from examples. We analyze GPT-4's semantic attention post-domain priming, using computational linguistics to quantify response similarity to lexicographically independent queries with the same intent. We assess potential privacy breaches from inadvertent domain anchorage, examining how attention and embedding layers process linguistic patterns. We hypothesize that domain-specific words receiving higher gradient updates can introduce bias, create semantic echo chambers, and oversimplify relationships. Grounded in Mohamed Zakaria Kurdi's frameworks, this research uses lexical, semantic, syntactic, and positional similarities to analyze GPT-4's vector transformations and attention distributions. By simulating domain-specific interactions through declarative primes and interrogative inputs, we highlight significant privacy and ethical concerns, as the model may share information across users due to domain anchorage.
A Conceptual Model of Trust in Generative AI Systems
(2025-01-07) Tahmasbi, Nargess; Rastegari, Elham; Truong, Minh
Generative Artificial Intelligence (GAI) significantly impacts various sectors, offering innovative solutions in consultation, self-education, and creativity. However, the trustworthiness of GAI outputs is questionable due to the absence of theoretical correctness guarantees and the opacity of Artificial Intelligence (AI) processes. These issues, compounded by potential biases and inaccuracies, pose challenges to GAI adoption. This paper delves into the trust dynamics in GAI, highlighting its unique capabilities to generate novel outputs and adapt over time, distinct from traditional AI. We introduce a model analyzing trust in GAI through user experience, operational capabilities, contextual factors, and task types. This work aims to enrich the theoretical discourse and practical approaches in GAI, setting a foundation for future research and applications.
Improving Stability Estimates in Adversarial Explainable AI through Alternate Search Methods
(2025-01-07) Burger, Christopher; Walter, Charles
Advances in the effectiveness of machine learning models have come at the cost of enormous complexity resulting in a poor understanding of how they function. Local surrogate methods have been used to approximate the workings of these complex models, but recent work has revealed their vulnerability to adversarial attacks where the explanation produced is appreciably different while the meaning and structure of the complex model’s output remains similar. This prior work has focused on the existence of these weaknesses but not on their magnitude. Here we explore using an alternate search method with the goal of finding minimum viable perturbations, the fewest perturbations necessary to achieve a fixed similarity value between the original and altered text’s explanation. Intuitively, a method that requires fewer perturbations to expose a given level of instability is inferior to one which requires more. This nuance allows for superior comparisons of the stability of explainability methods.
Generalized Loss-Function-Based Attacks for Object Detection Models
(2025-01-07) Datta, Soumil; Walter, Charles
As artificial intelligence (AI) systems become increasingly integrated into daily life, the robustness of these systems, particularly object detection models, has gained substantial attention. Object detection is crucial in applications ranging from autonomous driving to surveillance. However, these models are vulnerable to adversarial attacks, which can deceive them into making incorrect predictions. This paper introduces a novel approach to generating inference-time adversarial attacks on object detection models using generalized loss functions. We present the Generalized Targeted Object Attacks (GTOA) and the Generalized Heuristic Object Suppression Technique (GHOST) algorithms, which perform targeted and vanishing attacks, respectively. Our method is highly adaptable, allowing attacks on any object detection model with minimal model adjustments. We demonstrate that our generalized loss function-based attacks are effective across various object detection models, highlighting the need for enhanced robustness in AI systems.
Introduction to the Minitrack on Artifical Intelligence Security: Ensuring Safety, Trustworthiness, and Responsibility in AI Systems
(2025-01-07) Brooks, Tyson; Chin, Shiu-Kai; Young, William; Devendorf, Erich

Browse

Recent Submissions