The Effect of Training Set Timeframe on the Future Performance of Machine Learning-based Malware Detection Models
Files
Date
2021-01-05
Authors
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
857
Ending Page
Alternative Title
Abstract
The occurrence of previously unseen malicious code or malware is an implicit and ongoing issue for all software-based systems. It has been recognized that machine learning, applied to features statically extracted from binary executable files, offers a number of promising benefits, such as its ability to detect malware that has not been previously encountered. Nevertheless it is understood that these models will not continue to perform equally well over time as new and potentially less recognizable malwares occur. In this study, we have applied a range of machine learning models to the features extracted from a large collection of software executables in Portable Executable format ordered by the date the binary was first encountered, consisting of both malware and benign examples, whilst considering different training set configurations and timeframes. We analyze and quantify the relative performance deterioration of these machine learning models on future test sets of these features, and discuss some insights into the characteristics and rate of machine learning-based malware detection performance deterioration and training set selection.
Description
Keywords
Accountability, Evaluation, and Obscurity of AI Algorithms, artificial intelligence, machine learning, malware, model maintenance
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 54th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.