论文标题
自动化的平行内核从动态应用轨迹提取
Automated Parallel Kernel Extraction from Dynamic Application Traces
论文作者
论文摘要
现代程序运行时由称为内核的重复代码的片段主导。内核可以通过增加内存局部性,增加数据并行性以及在内核中利用生产者 - 消费者的并行性来加速 - 这需要专门用于特定类别内核的硬件。编程此硬件可能很困难,要求在代码中识别和注释内核或转化为特定于域的语言。本文介绍了一种从动态应用程序跟踪中自动定位并行内核的技术,从而促进了进一步的代码优化。 动态跟踪收集是快速而紧凑的。通过优化,它只会产生每秒九个兆字节上一个兆字节的因素的时间稀释,并解决了对这种方法的重大批评。内核提取是准确的,并且在对数内存中以线性时间进行,从而检测到广泛的内核。该方法在16个库中得到了验证,该图书馆由10,507个内核实例组成。为了验证我们检测到的内核的准确性,编写了五个测试程序,这些测试程序跨越了传统的内核定义,并经过认证,可以包含所有预期的内核。
Modern program runtime is dominated by segments of repeating code called kernels. Kernels are accelerated by increasing memory locality, increasing data-parallelism, and exploiting producer-consumer parallelism among kernels - which requires hardware specialized for a particular class of kernels. Programming this hardware can be difficult, requiring that the kernels be identified and annotated in the code or translated to a domain-specific language. This paper describes a technique to automatically localize parallel kernels from a dynamic application trace, facilitating further code optimization. Dynamic trace collection is fast and compact. With optimization, it only incurs a time-dilation of a factor on nine and file-size of one megabyte per second, addressing a significant criticism of this approach. Kernel extraction is accurate and performed in linear time within logarithmic memory, detecting a wide range of kernels. This approach was validated across 16 libraries, comprised of 10,507 kernels instances. To validate the accuracy of our detected kernels, five test programs were written that spans traditional kernel definitions and were certified to contain all the kernels that were expected.