Please use this identifier to cite or link to this item:

HoneyCode: Automating Deceptive Software Repositories with Deep Generative Models

File Size Format  
0679.pdf 656.08 kB Adobe PDF View/Open

Item Summary

Title:HoneyCode: Automating Deceptive Software Repositories with Deep Generative Models
Authors:Nguyen, David
Liebowitz, David
Nepal, Surya
Kanhere, Salil
Keywords:Cyber Operations, Defence, and Forensics
cyber defence
generative models
show 1 moreneural networks
show less
Date Issued:05 Jan 2021
Abstract:We propose HoneyCode, an architecture for the generation of synthetic software repositories for cyber deception. The synthetic repositories have the characteristics of real software, including language features, file names and extensions, but contain no real intellectual property. The fake software can be used as a honeypot or form part of a deceptive environment. Existing approaches to software repository generation lack scalability due to reliance on hand-crafted structures for specific languages. Our approach is language agnostic and learns the underlying representations of repository structures, filenames and file content through a novel Tree Recurrent Network (TRN) and two recurrent networks (RNN) respectively. Each stage of the sequential generation process utilises features from prior steps, which increases the honey repository’s authenticity and consistency. Experiments show TRN generates tree samples that reduce degree mean maximal distance (MMD) by 90-92% and depth MMD by 75-86% to a held out test data set in comparison to recent deep graph generators and a baseline random tree generator. In addition, our RNN models generate convincing filenames with authentic syntax and realistic file content.
Pages/Duration:10 pages
Rights:Attribution-NonCommercial-NoDerivatives 4.0 International
Appears in Collections: Cyber Operations, Defence, and Forensics

Please email if you need this content in ADA-compliant format.

This item is licensed under a Creative Commons License Creative Commons