论文标题
bin:基于相互作用的跨架构IoT二进制相似性比较
Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison
论文作者
论文摘要
物联网(IoT)恶意软件的大浪潮反映了当前物联网生态系统的脆弱性。研究发现,物联网恶意软件可以在不同处理器体系结构的设备上迅速传播,这使我们的注意力引起了跨体系结构的二进制相似性比较技术。二进制相似性比较的目的是确定两个二进制片段的语义是否相似。现有的基于学习的方法通常会单独学习二进制代码段的表示形式,并根据距离度量执行相似性匹配,而无需考虑二进制语义相互作用。此外,他们经常依靠大规模的外部代码语料库进行预训练的说明嵌入,这是重量级且易于遭受量不超过的(OOV)问题。在本文中,我们提出了一个基于相互作用的跨架构IoT二进制相似性比较系统Inter-BIN。我们的关键见解是通过共同注意机制在指导序列之间引入相互作用,该机制可以灵活地对不同体系结构的语义相关指令进行软对准。我们设计了一种轻巧的多功能融合指令嵌入方法,该方法可以避免繁重的工作量和以前方法的OOV问题。广泛的实验表明,在跨体系结构二进制相似性比较不同输入粒度的任务上,bin可以显着胜过最先进的方法。此外,我们提供了一个来自真实网络环境的IoT恶意软件功能,该功能匹配数据集,该函数包含1,878,437个跨架构重用功能对。关于十字架的实验结果证明,二线对现实世界的二进制相似性比较集合是实用且可扩展的。
The big wave of Internet of Things (IoT) malware reflects the fragility of the current IoT ecosystem. Research has found that IoT malware can spread quickly on devices of different processer architectures, which leads our attention to cross-architecture binary similarity comparison technology. The goal of binary similarity comparison is to determine whether the semantics of two binary snippets is similar. Existing learning-based approaches usually learn the representations of binary code snippets individually and perform similarity matching based on the distance metric, without considering inter-binary semantic interactions. Moreover, they often rely on the large-scale external code corpus for instruction embeddings pre-training, which is heavyweight and easy to suffer the out-of-vocabulary (OOV) problem. In this paper, we propose an interaction-based cross-architecture IoT binary similarity comparison system, Inter-BIN. Our key insight is to introduce interaction between instruction sequences by co-attention mechanism, which can flexibly perform soft alignment of semantically related instructions from different architectures. And we design a lightweight multi-feature fusion-based instruction embedding method, which can avoid the heavy workload and the OOV problem of previous approaches. Extensive experiments show that Inter-BIN can significantly outperform state-of-the-art approaches on cross-architecture binary similarity comparison tasks of different input granularities. Furthermore, we present an IoT malware function matching dataset from real network environments, CrossMal, containing 1,878,437 cross-architecture reuse function pairs. Experimental results on CrossMal prove that Inter-BIN is practical and scalable on real-world binary similarity comparison collections.