Behavioral Malware Detection using a Language Model Classifier Trained on sys2vec Embeddings
Files
Date
2024-01-03
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
7582
Ending Page
Alternative Title
Abstract
Behavioral malware detection is an effective way to detect ever-changing malware. Often, kernel-level system calls are collected on device and then processed and fed to machine learning models. In this work, we show that using simple natural language processing (NLP) techniques on system calls, such as a bag-of-n-grams model, coupled with shallow machine learning classifiers, are not as useful for stealthier malware. In contrast, training a Word2Vec-like model, which we call sys2vec, on the system call traces and feeding the resulting embeddings to a language model classifier provides consistently better results. We evaluate and compare the two classifiers using Area Under the Receiver Operating Characteristic Curve (AUC) and the True Positive Rate (TPR) at an acceptable False Positive Rate (FPR). We then discuss how this work can be further expanded in the language model space going forward.
Description
Keywords
Machine Learning and AI: Cybersecurity and Threat Hunting, attention, behavioral malware detection, gru, language models, machine learning
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 57th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.