论文标题

可靠的高性能多核3D-NOC系统的低空柔软耐受耐受性架构,设计和管理方案

A low-overhead soft-hard fault-tolerant architecture, design and management scheme for reliable high-performance many-core 3D-NoC systems

论文作者

Dang, Khanh N, Meyer, Michael, Okuyama, Yuichi, Abdallah, Abderazek Ben

论文摘要

片上的网络(NOC)范式已被提议作为一种有利的解决方案,以应对单个芯片上日益数量的核心之间严格的通信要求。但是,NOC系统暴露于晶体管,低工作电压以及高集成和功率密度的攻击性缩小,使其容易受到永久(硬)故障和瞬态(软)误差的影响。 NOC的硬故障会导致外部阻塞,从而在整个网络上引起交通拥堵。由于其无声数据损坏,软错误更具挑战性,由于错误传播,数据包重新传输和僵局,导致大量错误数据。在本文中,我们介绍了综合软误差和耐硬故障的3D-NOC系统的架构和设计,该系统名为3D-耐力 - 耐受误差 - 耐受性oasis-noc(3D-FETO)。借助有效的机制和算法,3D-FETO能够检测和从路由管道阶段中发生的软误差和恢复,并利用可重新配置的组件来处理链路,输入缓冲区和跨杆中的永久故障。深入的评估结果表明,3D-FETO系统能够解决各种硬性故障和软错误,从而确保优雅的性能退化,同时最大程度地降低了其他硬件复杂性并保持功率效率。

The Network-on-Chip (NoC) paradigm has been proposed as a favorable solution to handle the strict communication requirements between the increasingly large number of cores on a single chip. However, NoC systems are exposed to the aggressive scaling down of transistors, low operating voltages, and high integration and power densities, making them vulnerable to permanent (hard) faults and transient (soft) errors. A hard fault in a NoC can lead to external blocking, causing congestion across the whole network. A soft error is more challenging because of its silent data corruption, which leads to a large area of erroneous data due to error propagation, packet re-transmission, and deadlock. In this paper, we present the architecture and design of a comprehensive soft error and hard fault-tolerant 3D-NoC system, named 3D-Hard-Fault-Soft-Error-Tolerant-OASIS-NoC (3D-FETO). With the aid of efficient mechanisms and algorithms, 3D-FETO is capable of detecting and recovering from soft errors which occur in the routing pipeline stages and leverages reconfigurable components to handle permanent faults in links, input buffers, and crossbars. In-depth evaluation results show that the 3D-FETO system is able to work around different kinds of hard faults and soft errors, ensuring graceful performance degradation, while minimizing additional hardware complexity and remaining power efficient.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源