论文标题
异步:用于命令和空间稀疏编程的固定际互合界优化
AsyncTaichi: On-the-fly Inter-kernel Optimizations for Imperative and Spatially Sparse Programming
论文作者
论文摘要
利用空间稀疏性已成为加速3D计算机图形应用程序的流行方法。在空间上稀疏的数据结构和有效的稀疏内核(例如在活动体素上的平行模板操作),是实现高性能的关键。现有的工作着重于提高单个稀疏计算内核中的性能。我们表明,看起来超出单个内核的系统,以及其他特定域的稀疏数据结构分析,为优化稀疏计算的新空间提供了令人兴奋的新空间。具体而言,我们提出了命令和稀疏计算程序的特定领域数据流图模型,该模型描述了内核关系并启用了简单的分析和优化。结合一个异步的执行引擎,揭示了宽的内核窗口,内内内内尔优化器可以执行有效的稀疏计算优化,例如消除不必要的素素列表世代和删除体素激活检查。这些特定于域的优化进一步为经典的通用优化而为最初具有挑战性的经典通用优化置于直接应用于稀疏数据结构的计算。没有任何计算代码修改,我们的新系统将导致$ 4.02 \ times $ $较少的内核启动和$ 1.87 \ times $ $加快我们的GPU基准测试,包括对Eulerian Grids,Lagrangian颗粒,网格,网格和自动差异的计算。
Leveraging spatial sparsity has become a popular approach to accelerate 3D computer graphics applications. Spatially sparse data structures and efficient sparse kernels (such as parallel stencil operations on active voxels), are key to achieve high performance. Existing work focuses on improving performance within a single sparse computational kernel. We show that a system that looks beyond a single kernel, plus additional domain-specific sparse data structure analysis, opens up exciting new space for optimizing sparse computations. Specifically, we propose a domain-specific data-flow graph model of imperative and sparse computation programs, which describes kernel relationships and enables easy analysis and optimization. Combined with an asynchronous execution engine that exposes a wide window of kernels, the inter-kernel optimizer can then perform effective sparse computation optimizations, such as eliminating unnecessary voxel list generations and removing voxel activation checks. These domain-specific optimizations further make way for classical general-purpose optimizations that are originally challenging to directly apply to computations with sparse data structures. Without any computational code modification, our new system leads to $4.02\times$ fewer kernel launches and $1.87\times$ speed up on our GPU benchmarks, including computations on Eulerian grids, Lagrangian particles, meshes, and automatic differentiation.