Machine Learning and AI: Cybersecurity and Threat Hunting
Permanent URI for this collectionhttps://hdl.handle.net/10125/107578
Browse
Recent Submissions
Item type: Item , Gradient Coupling Effect of Poisoning Attacks in Federated Learning(2024-01-03) Wei, Wenqi; Liu, LingPoisoning Attack is a dominating threat in distributed learning, where the mediator has limited control over the distributed client contributing to the joint model. In this paper, we present a comprehensive study on the coupling effect of poisoning attacks from three perspectives. First, we identify the theoretical foundation of the weak coupling phenomenon of gradient eigenvalues when under the poisoning attack. Second, we analyze the behavior of gradient coupling under four scenarios: adaptive attacker, skewed client selection, Non-IID data distribution, and different gradient window sizes. We study when the weak coupling effect would fail as the attack indicator. Last, we examine the coupling effect by revisiting several existing poisoning mitigation approaches. Through formal analysis and extensive empirical evidence, we show under what conditions the weak coupling effect of poisoning attacks can serve as forensic evidence for attack mitigation in federated learning and how it interacts with the existing defenses.Item type: Item , Defense Against Adversarial Attacks for Neural Representations of Text(2024-01-03) Zhan, Huixin; Zhang, Kun; Chen, Zhong; Sheng, VictorIn this paper, we focus on defending against adversarial attacks for privacy-preserving Natural Language Processing (NLP) under a model partitioning scenario, where the model splits into a local, on-device part and a remote, cloud-based part. Model partitioning improves the scalability and protects the privacy of inputs into the model. However, we argue that privacy protection breaks during inference with model partitioning. In this paper, an adversary eavesdrops on the hidden representations output from the local devices and tries to use the representations to obtain private information from the input text. We study two types of adversarial attacks, i.e., adversarial classification and adversarial generation. Based on these two attack models, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Specifically, the DAC and DAG approaches are both bilevel optimization-based defense methods. Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization protect sensitive information from the adversary attack while maintaining their utility for downstream tasks. Our experiments show that both DAC and DAG approaches improve the performance of the main text classifier and achieve even higher privacy of neural representations compared with the current state-of-the-art methods.Item type: Item , Behavioral Malware Detection using a Language Model Classifier Trained on sys2vec Embeddings(2024-01-03) Carter, John; Mancoridis, Spiros; Protopapas, Pavlos; Galinkin, ErickBehavioral malware detection is an effective way to detect ever-changing malware. Often, kernel-level system calls are collected on device and then processed and fed to machine learning models. In this work, we show that using simple natural language processing (NLP) techniques on system calls, such as a bag-of-n-grams model, coupled with shallow machine learning classifiers, are not as useful for stealthier malware. In contrast, training a Word2Vec-like model, which we call sys2vec, on the system call traces and feeding the resulting embeddings to a language model classifier provides consistently better results. We evaluate and compare the two classifiers using Area Under the Receiver Operating Characteristic Curve (AUC) and the True Positive Rate (TPR) at an acceptable False Positive Rate (FPR). We then discuss how this work can be further expanded in the language model space going forward.Item type: Item , IoT Malware Data Augmentation using a Generative Adversarial Network(2024-01-03) Carter, John; Mancoridis, Spiros; Protopapas, Pavlos; Galinkin, ErickBehavioral malware detection has been shown to be an effective method for detecting malware running on computing hosts. Machine learning (ML) models are often used for this task, which use representative behavioral data from a device to make a classification as to whether an observation is malware or not. Although these models can perform well, machine learning models in security are often trained on imbalanced training datasets that yield poor real-world efficacy, as they favor the overrepresented class. Thus, we need a way to augment the underrepresented class. Some common data augmentation techniques include SMOTE, data resampling/upsampling, or using generative algorithms. In this work, we explore using generative algorithms for this task, and show how those results compare to results obtained using SMOTE and upsampling. Specifically, we feed the less-represented class of data into a Generative Adversarial Network (GAN) to create enough realistic synthetic data to balance the dataset. In this work, we show how using a GAN to balance a dataset that favors benign data helps a shallow Neural Network achieve a higher Area Under the Receiver Operating Characteristic Curve (AUC) and a lower False Positive Rate (FPR).Item type: Item , Introduction to the Minitrack on Machine Learning and AI: Cybersecurity and Threat Hunting(2024-01-03) Kayhan, Varol; Shivendu, Shivendu; Agrawal, Manish; Zeng, David
