Cacheperf：通过混合硬件采样的统一缓存小姐分类器

论文标题

Cacheperf：通过混合硬件采样的统一缓存小姐分类器

CachePerf: A Unified Cache Miss Classifier via Hybrid Hardware Sampling

论文作者

Zhou, Jin, Steven, Tang, Yang, Hanmei, Liu, Tongping

论文摘要

该缓存在确定应用程序的性能中起关键作用，无论是在均匀和异质体系结构上的连续或并发程序。固定缓存误差需要了解缓存误差的来源和类型。但是，即使经过数十年的研究，这仍然是一个尚未解决的问题。本文提出了一个统一的分析工具-CACHEPERF-可以正确识别不同类型的缓存错过，将分配器引起的问题与应用程序的问题区分开来，并排除了没有太大绩效影响的次要问题。 Cacheperf背后的核心思想是一种混合采样方案：它采用基于PMU的粗粒抽样来选择很少的易感指令（经常使用缓存误差），然后采用基于断点的细粒采样来收集这些指令的内存访问模式。根据我们的评估，Cacheperf仅施加14％的性能开销和19％的内存开销（对于具有较大足迹的应用程序），同时正确识别了高速缓存的类型。 cacheperf检测到9个未知的错误。修复报告的错误从3％达到3788％的性能加速。 Cacheperf由于其有效性和较低的开销，将是现有伪造者必不可少的补充。

The cache plays a key role in determining the performance of applications, no matter for sequential or concurrent programs on homogeneous and heterogeneous architecture. Fixing cache misses requires to understand the origin and the type of cache misses. However, this remains to be an unresolved issue even after decades of research. This paper proposes a unified profiling tool--CachePerf--that could correctly identify different types of cache misses, differentiate allocator-induced issues from those of applications, and exclude minor issues without much performance impact. The core idea behind CachePerf is a hybrid sampling scheme: it employs the PMU-based coarse-grained sampling to select very few susceptible instructions (with frequent cache misses) and then employs the breakpoint-based fine-grained sampling to collect the memory access pattern of these instructions. Based on our evaluation, CachePerf only imposes 14% performance overhead and 19% memory overhead (for applications with large footprints), while identifying the types of cache misses correctly. CachePerf detected 9 previous-unknown bugs. Fixing the reported bugs achieves from 3% to 3788% performance speedup. CachePerf will be an indispensable complementary to existing profilers due to its effectiveness and low overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题