大型语言模型是零拍打器：通过大语言模型对深度学习的图书馆进行模糊

论文标题

大型语言模型是零拍打器：通过大语言模型对深度学习的图书馆进行模糊

Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models

论文作者

Deng, Yinlin, Xia, Chunqiu Steven, Peng, Haoran, Yang, Chenyuan, Zhang, Lingming

论文摘要

检测深度学习（DL）库中的错误（例如Tensorflow/Pytorch）对于几乎所有下游DL系统都至关重要，以确保最终用户的有效性/安全性。同时，传统的模糊技术对于这种具有挑战性的领域几乎没有效，因为输入DL程序需要满足输入语言（例如Python）语法/语义和语法和DL API输入/形状约束，以进行张量计算。为了解决这些限制，我们提出了Titanfuzz - 直接利用大型语言模型（LLMS）生成用于模糊DL库的输入程序的第一种方法。 LLMS是经过数十亿代码段培训的泰坦尼克号模型，可以自动产生类似人类的代码片段。我们的关键见解是，现代LLM还可以在其培训语料库中包含大量代码片段来调用DL库API，因此可以隐含地学习语言语法/语义和复杂的DL API约束，以创建有效的DL程序生成。更具体地说，我们同时使用生成和填充的LLM（例如Codex/Incoder）来生成和突变有效/多样的输入DL程序进行模糊。我们的实验结果表明，与Tensorflow/Pytorch上最新的fuzzer相比，Titanfuzz的代码覆盖率高30.38％/50.84％。此外，TitanFuzz能够检测到65个错误，其中41个已确认为以前未知的错误。本文表明，现代的泰坦尼克号LLM可以被利用直接执行基于世代的基于一代和基于突变的模糊数十年，同时是完全自动化，可推广的，适用于对传统方法（例如DL系统）充满挑战的领域。我们希望Titanfuzz能够在LLMS的这个有希望的方向上刺激更多的工作。

Detecting bugs in Deep Learning (DL) libraries (e.g., TensorFlow/PyTorch) is critical for almost all downstream DL systems in ensuring effectiveness/safety for end users. Meanwhile, traditional fuzzing techniques can be hardly effective for such a challenging domain since the input DL programs need to satisfy both the input language (e.g., Python) syntax/semantics and the DL API input/shape constraints for tensor computations. To address these limitations, we propose TitanFuzz - the first approach to directly leveraging Large Language Models (LLMs) to generate input programs for fuzzing DL libraries. LLMs are titanic models trained on billions of code snippets and can auto-regressively generate human-like code snippets. Our key insight is that modern LLMs can also include numerous code snippets invoking DL library APIs in their training corpora, and thus can implicitly learn both language syntax/semantics and intricate DL API constraints for valid DL program generation. More specifically, we use both generative and infilling LLMs (e.g., Codex/InCoder) to generate and mutate valid/diverse input DL programs for fuzzing. Our experimental results demonstrate that TitanFuzz can achieve 30.38%/50.84% higher code coverage than state-of-the-art fuzzers on TensorFlow/PyTorch. Furthermore, TitanFuzz is able to detect 65 bugs, with 41 already confirmed as previously unknown bugs. This paper demonstrates that modern titanic LLMs can be leveraged to directly perform both generation-based and mutation-based fuzzing studied for decades, while being fully automated, generalizable, and applicable to domains challenging for traditional approaches (such as DL systems). We hope TitanFuzz can stimulate more work in this promising direction of LLMs for fuzzing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题