论文标题
勒索软件加密文件标识的熵计算方法的比较
Comparison of Entropy Calculation Methods for Ransomware Encrypted File Identification
论文作者
论文摘要
勒索软件是一类恶意类软件,它利用加密来实施对系统可用性的攻击。目标的数据仍然被加密,并被攻击者俘虏,直到满足赎金需求为止。许多加密携带器件检测技术使用的一种常见方法是监视文件系统活动,并试图识别写入磁盘的加密文件,通常使用文件的熵作为加密指标。但是,通常在这些技术的描述中,很少有或没有讨论为什么选择特定的熵计算技术或关于为什么选择一种技术在替代方案中选择一种技术的理由。熵计算的香农方法是在加密软件检测技术中的文件加密识别时,是最常用的技术。总体而言,正确加密的数据应与随机数据没有区别,因此,除了标准的数学熵计算(例如卡方,香农熵和串行相关性)外,还可以用于验证伪随机数字生成器的输出的测试套件,也将适合进行此分析。他的假设是,不同的熵方法之间存在基本差异,并且最佳方法可用于更好地检测勒索软件加密文件。本文比较了53种不同测试的准确性,以区分加密数据和其他文件类型。测试分为两个阶段,第一阶段用于识别潜在的候选测试,以及对这些候选者进行彻底评估的第二阶段。为了确保测试足够鲁棒,使用了凝固姑娘数据集。该数据集包含数千个最常用的文件类型的示例,以及已通过加密携带软件加密的文件示例。
Ransomware is a malicious class of software that utilises encryption to implement an attack on system availability. The target's data remains encrypted and is held captive by the attacker until a ransom demand is met. A common approach used by many crypto-ransomware detection techniques is to monitor file system activity and attempt to identify encrypted files being written to disk, often using a file's entropy as an indicator of encryption. However, often in the description of these techniques, little or no discussion is made as to why a particular entropy calculation technique is selected or any justification given as to why one technique is selected over the alternatives. The Shannon method of entropy calculation is the most commonly-used technique when it comes to file encryption identification in crypto-ransomware detection techniques. Overall, correctly encrypted data should be indistinguishable from random data, so apart from the standard mathematical entropy calculations such as Chi-Square, Shannon Entropy and Serial Correlation, the test suites used to validate the output from pseudo-random number generators would also be suited to perform this analysis. he hypothesis being that there is a fundamental difference between different entropy methods and that the best methods may be used to better detect ransomware encrypted files. The paper compares the accuracy of 53 distinct tests in being able to differentiate between encrypted data and other file types. The testing is broken down into two phases, the first phase is used to identify potential candidate tests, and a second phase where these candidates are thoroughly evaluated. To ensure that the tests were sufficiently robust, the NapierOne dataset is used. This dataset contains thousands of examples of the most commonly used file types, as well as examples of files that have been encrypted by crypto-ransomware.