Guardrail Vulnerabilities in Open-Source Language Models: Implications for Democratic Discourse and Marginalized Communities
Loading...
Files
Date
Authors
Contributor
Advisor
Editor
Performer
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Interviewee
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Journal Name
Volume
Number/Issue
Starting Page
6792
Ending Page
Alternative Title
Abstract
The proliferation of open-source Large Language Models (LLMs) presents a complex technological phenomenon with significant societal implications. While these models democratize access to advanced Natural Language Processing (NLP) capabilities, they simultaneously amplify risks for marginalized communities who often bear the disproportionate burden of technological misuse. Our research examines systematic vulnerabilities in guardrail mechanisms across seven prominent open-source LLMs, revealing patterns of harmful content generation that threaten democratic discourse and social cohesion. Through empirical analysis using advanced NLP classification methods, we demonstrate that popular open-source models consistently generate content classified as hateful or offensive when subjected to adversarial prompting techniques. These findings directly contradict the safety assurances provided by model developers, particularly Meta AI's stated commitment that their systems should present balanced perspectives on debated policy issues rather than singular viewpoints.
Description
Citation
DOI
Extent
10 pages
Format
Type
Conference Paper
Geographic Location
Time Period
Related To
Proceedings of the 59th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Catalog Record
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.
