时间矢量化：一种自动多泵的编译器方法

论文标题

时间矢量化：一种自动多泵的编译器方法

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping

论文作者

Johnsen, Carl-Johannes, De Matteis, Tiziano, Ben-Nun, Tal, Licht, Johannes de Fine, Hoefler, Torsten

论文摘要

通过允许硬件组件以比周围系统更高的时钟频率运行，可以克服单锁FPGA设计中常见的局限性。但是，这种优化不能以高水平的抽象（例如HLS）表示，需要使用手工优化的RTL。在本文中，我们展示了如何通过高级程序上的数据运动分析来利用可重构设备上的计算子域的多个时钟域。我们提供了有关多泵作为编译器优化的新颖观点 - 传统矢量化的超类。随着多个数据元素被馈送和消耗，计算在时间上而不是空间填充。优化是使用将高级代码映射到HLS的中间表示自动应用的。在内部，优化将模块注入生成的设计中，并结合了RTL，以用于对时钟域的细粒度控制。我们可以将资源消耗减少多达50％，平均为23％。对于可扩展的设计，这可以使进一步的并行性，从而提高整体性能。

The multi-pumping resource sharing technique can overcome the limitations commonly found in single-clocked FPGA designs by allowing hardware components to operate at a higher clock frequency than the surrounding system. However, this optimization cannot be expressed in high levels of abstraction, such as HLS, requiring the use of hand-optimized RTL. In this paper we show how to leverage multiple clock domains for computational subdomains on reconfigurable devices through data movement analysis on high-level programs. We offer a novel view on multi-pumping as a compiler optimization - a superclass of traditional vectorization. As multiple data elements are fed and consumed, the computations are packed temporally rather than spatially. The optimization is applied automatically using an intermediate representation that maps high-level code to HLS. Internally, the optimization injects modules into the generated designs, incorporating RTL for fine-grained control over the clock domains. We obtain a reduction of resource consumption by up to 50% on critical components and 23% on average. For scalable designs, this can enable further parallelism, increasing overall performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题