使用静态呼叫图中使用函数表示形式分类恶意软件

论文标题

使用静态呼叫图中使用函数表示形式分类恶意软件

Classifying Malware Using Function Representations in a Static Call Graph

论文作者

Dalton, Thomas, Schmidtler, Mauritius, Khodabakhshi, Alireza Hadj

论文摘要

我们建议使用X86汇编说明的功能呼叫图来识别恶意软件系列的深度学习方法。尽管存在静态呼叫图分析的先前工作，但很少涉及现代，原则性的特征学习技术在问题上的应用。在本文中，我们介绍了一个使用可执行可执行的函数调用图的系统，其中函数表示通过复发性神经网络（RNN）自动编码器获得，该系统将X86指令的序列映射到密集的潜在向量。然后将这些函数嵌入式建模为图表中的顶点，并带有指示呼叫依赖性的边缘。捕获可执行文件的丰富，节点级表示以及全球，拓扑特性极大地提高了恶意软件的家庭检测率，并以故意避免避免乏味的功能工程和领域专业知识的方式有助于对问题的更有原则性的方法。我们通过对Microsoft恶意软件分类数据集进行多个实验来测试我们的方法，并在分类准确性为99.41％的恶意软件家族之间实现出色的分离。

We propose a deep learning approach for identifying malware families using the function call graphs of x86 assembly instructions. Though prior work on static call graph analysis exists, very little involves the application of modern, principled feature learning techniques to the problem. In this paper, we introduce a system utilizing an executable's function call graph where function representations are obtained by way of a recurrent neural network (RNN) autoencoder which maps sequences of x86 instructions into dense, latent vectors. These function embeddings are then modeled as vertices in a graph with edges indicating call dependencies. Capturing rich, node-level representations as well as global, topological properties of an executable file greatly improves malware family detection rates and contributes to a more principled approach to the problem in a way that deliberately avoids tedious feature engineering and domain expertise. We test our approach by performing several experiments on a Microsoft malware classification data set and achieve excellent separation between malware families with a classification accuracy of 99.41%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题