Rabadi, DimaY. Loo, JiaG. Teo, Sin2024-12-262024-12-262025-01-07978-0-9981331-8-8b2f2034b-3161-431e-baad-397fff1930eahttps://hdl.handle.net/10125/108883Malware detection presents significant challenges due to the need to select features from diverse data sources, such as system calls and registry keys, impacting model accuracy. Existing techniques often rely on a single feature type to reduce feature numbers or require extensive feature engineering, potentially failing to capture intricate relationships between various features. Moreover, these methods usually assume that features are independent, which is not true for complex malware behavior. Despite their success, the reliance on handcrafted features and inability to fully leverage contextual information limits their effectiveness against sophisticated malware. To address these constraints, we introduce BERT-Cuckoo15, a malware detection model that leverages Bidirectional Encoder Representations from Transformers (BERT), to analyze relationships between diverse features derived from the dynamic analysis of samples in the Cuckoo sandbox. The model processes and encodes these features into chunks, allowing for the aggregation of contextual information across different system activities. Our evaluation, conducted on a comprehensive and balanced dataset of 36,770 samples across nine malware types, demonstrates the efficacy of our approach. BERT-Cuckoo15 achieves an accuracy of 97.61%, showcasing its ability to capture complex feature interdependencies and improve malware detection accuracy.10Attribution-NonCommercial-NoDerivatives 4.0 InternationalCybersecurity in the Age of Artificial Intelligence, AI for Cybersecurity, and Cybersecurity for AImalware detection; bert; natural language processing; transformers; feature generation; malicious behavior analysis; dynamic analysisBERT-Cuckoo15: A Comprehensive Framework for Malware Detection Using 15 Dynamic Feature TypesConference Paper