Skipfuzz：基于积极学习的主动输入选择，用于模糊深度学习库

论文标题

Skipfuzz：基于积极学习的主动输入选择，用于模糊深度学习库

SkipFuzz: Active Learning-based Input Selection for Fuzzing Deep Learning Libraries

论文作者

Kang, Hong Jin, Rattanukul, Pattarakrit, Haryono, Stefanus Agus, Nguyen, Truong Giang, Ragkhitwetsagul, Chaiyong, Pasareanu, Corina, Lo, David

论文摘要

许多现代的软件系统由TensorFlow和Pytorch等深度学习库启用。由于深度学习现在很普遍，深度学习库的安全是关键问题。模糊的深度学习库提出了两个挑战。首先，要达到库的功能，模糊必须使用每个API函数的有效输入域中的输入，这可能是未知的。其次，许多输入是多余的。随机采样无效的输入可能不会触发新行为。尽管现有方法部分解决了第一个挑战，但他们忽略了第二个挑战。我们提出了Skipfuzz，这是一种模糊深度学习库的方法。为了生成有效的输入，Skipfuzz使用主动学习了解每个API函数的输入约束。通过使用模糊过程中获得的信息，Skipfuzz渗透了输入约束的模型，从而生成有效的输入。 Skipfuzz包括一个活跃的学习者，该学习者查询测试执行者以获取推理的反馈。构建假设后，主动学习者使用测试执行人的反馈提出查询并完善了假设，该反馈指示库是否接受或拒绝输入，即是否满足输入约束。来自不同类别的输入用于调用库来检查一组输入是否满足函数的输入约束。一个类别中的输入与其他类别通过他们所满足的可能输入约束区分开来，例如它们是某种形状的张量。因此，Skipfuzz能够通过消除输入约束的可能候选者来完善其假设。这种积极的学习方法解决了冗余输入的挑战。使用Skipfuzz，我们发现并报告了43起崩溃。其中28个已被确认，分配了13个独特的CVE。

Many modern software systems are enabled by deep learning libraries such as TensorFlow and PyTorch. As deep learning is now prevalent, the security of deep learning libraries is a key concern. Fuzzing deep learning libraries presents two challenges. Firstly, to reach the functionality of the libraries, fuzzers have to use inputs from the valid input domain of each API function, which may be unknown. Secondly, many inputs are redundant. Randomly sampled invalid inputs are likely not to trigger new behaviors. While existing approaches partially address the first challenge, they overlook the second challenge. We propose SkipFuzz, an approach for fuzzing deep learning libraries. To generate valid inputs, SkipFuzz learns the input constraints of each API function using active learning. By using information gained during fuzzing, SkipFuzz infers a model of the input constraints, and, thus, generate valid inputs. SkipFuzz comprises an active learner which queries a test executor to obtain feedback for inference. After constructing hypotheses, the active learner poses queries and refines the hypotheses using the feedback from the test executor, which indicates if the library accepts or rejects an input, i.e., if it satisfies the input constraints or not. Inputs from different categories are used to invoke the library to check if a set of inputs satisfies a function's input constraints. Inputs in one category are distinguished from other categories by possible input constraints they would satisfy, e.g. they are tensors of a certain shape. As such, SkipFuzz is able to refine its hypothesis by eliminating possible candidates of the input constraints. This active learning-based approach addresses the challenge of redundant inputs. Using SkipFuzz, we have found and reported 43 crashes. 28 of them have been confirmed, with 13 unique CVEs assigned.

下载PDF全文

下载文献需遵守相关版权规定

论文标题