论文标题
HEP框架:LHC大数据应用程序的有效工具
HEP-Frame: an Efficient Tool for Big Data Applications at the LHC
论文作者
论文摘要
Hep-Frame是一种新的C ++软件包,旨在有效地对大量事件进行数据集的分析,例如Geneva Cern的大型强子对撞机(LHC)可用的事件。它主要针对高性能服务器和迷你群体,它是为具有用户友好界面的自然科学专家设计的,可访问结构化的数据库。 HEP框架会自动评估基础计算资源,并在创建数据分析应用程序时构建足够的代码骨架。在运行时,HEP框架分析了一系列数据集,以探索代码和硬件资源中可用的并行性:它同时读取从用户定义的数据结构的输入,并按照用户特定的需求顺序来选择相关数据;它管理该序列的有效执行;它输出会导致用户定义的对象(例如根结构)与所使用的输入数据一起存储。本文展示了域专家软件开发如何从HEP框架中受益,以及它如何显着改善LHC质子 - 普罗顿碰撞中产生的大型数据集的分析。讨论了两个案例研究:在LHC处的顶级夸克和希格斯玻色子(TTH)以及LHC高亮度阶段(HL-HLC)的双重和单个顶级夸克生产。结果表明,HEP框架对分析代码行为和结构的认识以及基础硬件系统提供了强大而透明的并行化机制,从而在很大程度上可以改善数据分析应用程序的执行时间。
HEP-Frame is a new C++ package designed to efficiently perform analyses of data sets from a very large number of events, like those available at the Large Hadron Collider (LHC) at CERN, Geneva. It mainly targets high performance servers and mini-clusters, and it was designed for natural science experts with a user-friendly interface to access structured databases. HEP-Frame automatically evaluates the underlying computing resources and builds an adequate code skeleton when creating a data analysis application. In run-time, HEP-Frame analyses a sequence of data sets exploring the available parallelism in the code and hardware resources: it concurrently reads inputs from an user-defined data structure and processes them, following the user specific sequence of requirements to select relevant data; it manages the efficient execution of that sequence; and it outputs results in user-defined objects (e.g., ROOT structures), stored together with the input data used. This paper shows how a domain expert software development can benefit from HEP-Frame, and how it significantly improved the performance of analyses of large data sets produced in proton-proton collisions at the LHC. Two case studies are discussed: the associated production of top quarks together with a Higgs boson (ttH) at the LHC, and a double and single top quark productions at the High-Luminosity phase of the LHC (HL-HLC). Results show that the HEP-Frame awareness of the analysis code behavior and structure, and the underlying hardware system, provides powerful and transparent parallelization mechanisms that largely improve the execution time of data analysis applications.