Cybersecurity in the Age of Artificial Intelligence, AI for Cybersecurity, and Cybersecurity for AI

Permanent URI for this collectionhttps://hdl.handle.net/10125/112403

Browse

Recent Submissions

Now showing 1 - 3 of 3
  • Item type: Item ,
    Automatic Extraction of Protected Health Information from Multilingual Hacker Communities
    (2026-01-06) Dacosta, Cade; Ampel, Benjamin; Hashim, Matthew; Chen, Hsinchun
    Protected Health Information (PHI, e.g., electronic health records, insurance information) is increasingly stolen in data breaches by malicious actors with the intent to sell to others in hacker communities. These actors often protect themselves by describing the content and availability of PHI data using encrypted messaging platforms (e.g., Telegram & Discord). However, the extent and nature of these PHI discussions are not well known. Therefore, in this research, we propose a Named Entity Recognition Framework for PHI (NERF-PHI) to systematically analyze PHI-related hacker conversations. To conduct our research, we collected more than three million multilingual hacker posts from Discord servers and Telegram groups. Utilizing open-source machine translation tools, we translated conversations to English and extracted information related to vulnerable individuals and medical entities. Results from our study suggest that encoder-based Large Language Models show significant promise for extracting PHI-related information from hacker communities and can be used by cybersecurity professionals and law enforcement to combat PHI misuse. Our study is also one of the first comprehensive analyses of multilingual PHI discussions in hacker communities.
  • Item type: Item ,
    Malicious Attack Challenges and Mitigation Strategies for Large Code Models: A Survey on Data Poisoning, Adversarial Attacks, and Backdoor Vulnerabilities
    (2026-01-06) Lin, Dongqing; Huangfu, Luwen; Liao, Chunhua; Chung, Brian; Gowda, Akul; Brettin, Thomas
    The rapid proliferation of Large Code Models (LCMs), driven by Large Language Models (LLMs) advancements, has revolutionized automated code generation and completion. However, their widespread adoption introduces significant security risks like data poisoning, adversarial attacks, and backdoor vulnerabilities perspectives. This survey comprehensively reviews LCMs' security landscape with more than 200 recent papers to identify and categorize threats in code generation techniques, and summarizes five mainstream mitigation strategies: Model Hardening, Data Sanitization, Adversarial Training, Security Alignment, and Evaluation Datasets. Uniquely, this work applies Evolutionary Game Theory (EGT) to conceptualize LCMs' security as a continuous ``arms race" between attackers and defenders, where the effectiveness of specific strategies serves as a fitness indicator. Our analysis reveals that while defense techniques have advanced, balancing the robustness and functionality of LCMs remains a persistent challenge. Our findings underscore the need for standardized security benchmarks and real-time threat monitoring to ensure the safety of LCM-powered software.
  • Item type: Item ,