The Effect of Training Set Timeframe on the Future Performance of Machine Learning-based Malware Detection Models

Date

2021-01-05

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

857

Ending Page

Alternative Title

Abstract

The occurrence of previously unseen malicious code or malware is an implicit and ongoing issue for all software-based systems. It has been recognized that machine learning, applied to features statically extracted from binary executable files, offers a number of promising benefits, such as its ability to detect malware that has not been previously encountered. Nevertheless it is understood that these models will not continue to perform equally well over time as new and potentially less recognizable malwares occur. In this study, we have applied a range of machine learning models to the features extracted from a large collection of software executables in Portable Executable format ordered by the date the binary was first encountered, consisting of both malware and benign examples, whilst considering different training set configurations and timeframes. We analyze and quantify the relative performance deterioration of these machine learning models on future test sets of these features, and discuss some insights into the characteristics and rate of machine learning-based malware detection performance deterioration and training set selection.

Description

Keywords

Accountability, Evaluation, and Obscurity of AI Algorithms, artificial intelligence, machine learning, malware, model maintenance

Citation

Extent

10 pages

Format

Geographic Location

Time Period

Related To

Proceedings of the 54th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.