论文标题
加速文件系统检查和维修PFSCK
Accelerating Filesystem Checking and Repair with pFSCK
论文作者
论文摘要
文件系统检查和恢复(C/R)工具在提高存储软件,识别和纠正文件系统不一致的可靠性方面起着关键作用。但是,随着磁盘容量和数据内容的增加,文件系统C/R工具臭名昭著地遭受了长时间的困扰。我们认为当前的文件系统检查器无法利用现代存储设备提供的CPU并行性和高吞吐量。为了克服这些挑战,我们提出了PFSCK,该工具可以重新设计C/R,以在indodes的粒度上启用细颗粒的并行性,而不会影响C/R功能的正确性。为了加速C/R,PFSCK首先采用数据并行性,通过在检查器的每个阶段识别功能操作并隔离相关操作及其共享数据结构。但是,完全隔离共享结构是不可行的,因此需要序列化限制可伸缩性。为了减少同步瓶颈和利用CPU并行性的影响,PFSCK设计管道并行性,允许C/R的多个阶段同时运行而不会影响正确性。为了实现不同文件系统数据配置的有效管道并行性,PFSCK提供了用于订购全局数据结构,有效的每个线程I/O高速缓存管理以及跨C/R不同通过的动态线程放置的技术。最后,PFSCK设计了一种资源感知的调度程序,旨在减少C/R对共享CPU和文件系统的其他应用程序的影响。 PFSCK的评估显示出超过2.6倍的E2FSCK收益,而XFS的检查器比XFS的Checker提供了超过1.8倍的收益,该检查器提供了粗粒平行性。
File system checking and recovery (C/R) tools play a pivotal role in increasing the reliability of storage software, identifying and correcting file system inconsistencies. However, with increasing disk capacity and data content, file system C/R tools notoriously suffer from long runtimes. We posit that current file system checkers fail to exploit CPU parallelism and high throughput offered by modern storage devices. To overcome these challenges, we propose pFSCK, a tool that redesigns C/R to enable fine-grained parallelism at the granularity of inodes without impacting the correctness of C/R's functionality. To accelerate C/R, pFSCK first employs data parallelism by identifying functional operations in each stage of the checker and isolating dependent operation and their shared data structures. However, fully isolating shared structures is infeasible, consequently requiring serialization that limits scalability. To reduce the impact of synchronization bottlenecks and exploit CPU parallelism, pFSCK designs pipeline parallelism allowing multiple stages of C/R to run simultaneously without impacting correctness. To realize efficient pipeline parallelism for different file system data configurations, pFSCK provides techniques for ordering updates to global data structures, efficient per-thread I/O cache management, and dynamic thread placement across different passes of a C/R. Finally, pFSCK designs a resource-aware scheduler aimed towards reducing the impact of C/R on other applications sharing CPUs and the file system. Evaluation of pFSCK shows more than 2.6x gains of e2fsck and more than 1.8x over XFS's checker that provides coarse-grained parallelism.