论文标题

BlackParrot Bedrock Cache Cooherence System

The BlackParrot BedRock Cache Coherence System

论文作者

Wyse, Mark, Petrisko, Daniel, Gilani, Farzam, Chueh, Yuan-Mao, Gao, Paul, Jung, Dai Cheol, Muralitharan, Sripathi, Ranga, Shashank Vijaya, Oskin, Mark, Taylor, Michael

论文摘要

本文介绍了BP-BETROCK,BLACKPARROT 64位RISC-V Multicore处理器中实现的开源缓存相干协议和系统。 BP-BETROCK实现了基于基础目录的MOESIF CACHE相干协议协议,并包括两个不同的开源连贯协议协议引擎,一种基于FSM,另一个基于Microcode可以编程。两种相干引擎都支持连贯的无法访问可缓存的内存和基于L1的原子读取模式操作。 BP-BETROCK安装在BlackParrot多核心中,已在GlobalFoundries 12nm Finfet流程中进行了验证,并用两个相干引擎验证了8核配置,启动Linux并从架子测试中运行。在描述了BP-BETROCK和两种连贯引擎的设计之后,我们通过分析处理占用率并在8核FPGA实现上运行Splash-3基准来研究它们的性能。仔细的设计和相干特定的ISA扩展使可编程控制器能够在我们的FPGA测试系统中所证明的,可以平均在固定功能FSM控制器(2.3%最差)的固定功能FSM控制器(2.3%最差)(2.3%)中实现性能。分析表明,可编程相干引擎在ASIC过程中仅增加了4%,并且在FPGA上仅增加了6.3%的逻辑利用率,每个核心增加了一个块RAM。

This paper presents BP-BedRock, the open-source cache coherence protocol and system implemented within the BlackParrot 64-bit RISC-V multicore processor. BP-BedRock implements the BedRock directory-based MOESIF cache coherence protocol and includes two different open-source coherence protocol engines, one FSM-based and the other microcode programmable. Both coherence engines support coherent uncacheable access to cacheable memory and L1-based atomic read-modify-write operations. Fitted within the BlackParrot multicore, BP-BedRock has been silicon validated in a GlobalFoundries 12nm FinFET process and FPGA validated with both coherence engines in 8-core configurations, booting Linux and running off the shelf benchmarks. After describing BP-BedRock and the design of the two coherence engines, we study their performance by analyzing processing occupancy and running the Splash-3 benchmarks on the 8-core FPGA implementations. Careful design and coherence-specific ISA extensions enable the programmable controller to achieve performance within 1% of the fixed-function FSM controller on average (2.3% worst-case) as demonstrated on our FPGA test system. Analysis shows that the programmable coherence engine increases die area by only 4% in an ASIC process and increases logic utilization by only 6.3% on FPGA with one additional block RAM added per core.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源