sysBERT: Improved Behavioral Malware Detection using BERT Trained on sys2vec Embeddings

Date

2025-01-07

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

7120

Ending Page

Alternative Title

Abstract

As malware becomes increasingly stealthy and more difficult to detect, behavioral malware detection has become the preferred method of detection, which uses representative run-time data from the device to determine if an infection has occurred. In this work, we collected kernel-level system calls from a router serving IoT devices during periods of benign behavior and periods of known malware infection. The system calls were processed using our custom-trained sys2vec model, which created contextual embeddings for each system call observed. We then subjected the data to a classifier using a Gated Recurrent Unit (GRU) with an Attention layer. Although this pipeline performed well for noisy, easy-to-detect malware, it struggled with stealthier malware. To combat this, we trained a classifier that uses a custom-trained BERT encoder in place of the GRU/Attention layers, which results in much better detection at a usable false positive rate (FPR) ≤ 1 × 10−5.

Description

Keywords

Cyber Operations, Defense, and Forensics, behavioral malware detection, bert, language models, machine learning

Citation

Extent

10

Format

Geographic Location

Time Period

Related To

Proceedings of the 58th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.