前：通过删除图形优化的约束来提高移动AI推理性能

论文标题

前：通过删除图形优化的约束来提高移动AI推理性能

AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

论文作者

Xu, Zhiying, Peng, Hongding, Wang, Wei

论文摘要

传统的深度学习编译器依靠启发式方法来进行子图生成，这对图形优化施加了额外的限制，例如，每个子图只能在最多一个复杂的操作员中包含。在本文中，我们提出了AGO，这是一个具有任意结构的图形优化框架，可通过删除此类约束来提高深层模型的推理性能。为了为复杂的子图创造新的优化机会，我们提出了密集的操作员融合，可以有效地将多个复杂的操作员融合在一起，以提高性能。此外，我们设计了一个图形分区方案，该方案允许为每个子图提供任意结构，同时保证所有生成的子图之间的无环属性。此外，为了在复杂的子图上进行有效的性能调整，我们设计了一种新颖的分隔和混合调谐机制来协调不同的系统组件。通过在各种神经网络和移动设备上进行的广泛实验，我们表明，与最先进的深层编译器相比，我们的系统可以提高推理性能高达3.3倍。

Traditional deep learning compilers rely on heuristics for subgraph generation, which impose extra constraints on graph optimization, e.g., each subgraph can only contain at most one complex operator. In this paper, we propose AGO, a framework for graph optimization with arbitrary structures to boost the inference performance of deep models by removing such constraints. To create new optimization opportunities for complicated subgraphs, we propose intensive operator fusion, which can effectively stitch multiple complex operators together for better performance. Further, we design a graph partitioning scheme that allows an arbitrary structure for each subgraph while guaranteeing the acyclic property among all generated subgraphs. Additionally, to enable efficient performance tuning on complicated subgraphs, we devise a novel divide-and-conquer tuning mechanism to orchestrate different system components. Through extensive experiments on various neural networks and mobile devices, we show that our system can improve the inference performance by up to 3.3x when compared with state-of-the-art deep compilers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题