Behavioral Malware Detection using a Language Model Classifier Trained on sys2vec Embeddings

Date

2024-01-03

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

7582

Ending Page

Alternative Title

Abstract

Behavioral malware detection is an effective way to detect ever-changing malware. Often, kernel-level system calls are collected on device and then processed and fed to machine learning models. In this work, we show that using simple natural language processing (NLP) techniques on system calls, such as a bag-of-n-grams model, coupled with shallow machine learning classifiers, are not as useful for stealthier malware. In contrast, training a Word2Vec-like model, which we call sys2vec, on the system call traces and feeding the resulting embeddings to a language model classifier provides consistently better results. We evaluate and compare the two classifiers using Area Under the Receiver Operating Characteristic Curve (AUC) and the True Positive Rate (TPR) at an acceptable False Positive Rate (FPR). We then discuss how this work can be further expanded in the language model space going forward.

Description

Keywords

Machine Learning and AI: Cybersecurity and Threat Hunting, attention, behavioral malware detection, gru, language models, machine learning

Citation

Extent

10 pages

Format

Geographic Location

Time Period

Related To

Proceedings of the 57th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.