Please use this identifier to cite or link to this item:

Interpretability of API Call Topic Models: An Exploratory Study

File Size Format  
0640.pdf 710.77 kB Adobe PDF View/Open

Item Summary

Title:Interpretability of API Call Topic Models: An Exploratory Study
Authors:Glendowne, Puntitra
Glendowne, Dae
Keywords:Machine Learning and Cyber Threat Intelligence and Analytics
api call
malware analysis
malware behaviors
topic model
Date Issued:07 Jan 2020
Abstract:Topic modeling is an unsupervised method for discovering semantically coherent combinations of words, called topics, in unstructured text. However, the human interpretability of topics discovered from non-natural language corpora, specifically Windows API call logs, is unknown. Our objective is to explore the coherence of topics and their ability to represent the themes of API calls from malware analysts’ perspective. Three Latent Dirichlet Allocation (LDA) models were fit to a collection of dynamic API call logs. Topics, or behavioral themes, were manually evaluated by malware analysts. The results were compared to existing automated quality measures. Participants were able to accurately determine API calls that did not belong in behavioral themes learned by the 20 topic model. Our results agree with topic coherence measures in terms of highest interpretable topics. The results are not compatible with log-perplexity, which concur with the findings of topic evaluation literature on natural language corpora.
Pages/Duration:10 pages
Rights:Attribution-NonCommercial-NoDerivatives 4.0 International
Appears in Collections: Machine Learning and Cyber Threat Intelligence and Analytics

Please email if you need this content in ADA-compliant format.

This item is licensed under a Creative Commons License Creative Commons