论文标题

用Hirgen模糊深度学习编译器

Fuzzing Deep Learning Compilers with HirGen

论文作者

Ma, Haoyang, Shen, Qingchao, Tian, Yongqiang, Chen, Junjie, Cheung, Shing-Chi

论文摘要

深度学习(DL)编译器被广泛采用,以优化高级DL模型,以在多种硬件上有效部署。他们的质量对编译DL模型的质量具有深远的影响。最近的一项错误研究表明,高级中间表示(IR)的优化是最容易出错的汇编阶段。此阶段的错误负责整个收集的虫子的44.92%。但是,现有的测试技术不考虑与高级优化相关的功能(例如高级IR),因此在此阶段暴露错误方面很弱。为了弥合这一差距,我们提出了Hirgen,这是一种自动测试技术,旨在在优化高级IR时有效暴露编码错误。赫根的设计包括1)三个覆盖标准,以生成多样化和有效的计算图; 2)充分利用高级IRS语言功能来产生不同的IRS; 3)从差分测试和变质测试中启发的三个测试牙齿。赫根(Hirgen)成功地检测到了TVM上发生的21次错误,并确认了17个错误和12个错误。此外,我们使用可以覆盖高级优化阶段的最先进的DL编译器模糊器来构建四个基线。我们的实验结果表明,Hirgen可以在48小时内检测到基线无法检测到的10次崩溃和不一致之处。我们进一步验证了我们提出的覆盖标准和测试甲壳在评估中的有用性。

Deep Learning (DL) compilers are widely adopted to optimize advanced DL models for efficient deployment on diverse hardware. Their quality has profound effect on the quality of compiled DL models. A recent bug study shows that the optimization of high-level intermediate representation (IR) is the most error-prone compilation stage. Bugs in this stage are accountable for 44.92% of the whole collected ones. However, existing testing techniques do not consider high-level optimization related features (e.g. high-level IR), and are therefore weak in exposing bugs at this stage. To bridge this gap, we propose HirGen, an automated testing technique that aims to effectively expose coding mistakes in the optimization of high-level IR. The design of HirGen includes 1) three coverage criteria to generate diverse and valid computational graphs; 2) full use of high-level IRs language features to generate diverse IRs; 3) three test oracles inspired from both differential testing and metamorphic testing. HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs confirmed and 12 fixed. Further, we construct four baselines using the state-of-the-art DL compiler fuzzers that can cover the high-level optimization stage. Our experiment results show that HirGen can detect 10 crashes and inconsistencies that cannot be detected by the baselines in 48 hours. We further validate the usefulness of our proposed coverage criteria and test oracles in evaluation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源