论文标题

模块化:异质分布式平台上的模块化关系分析

Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms

论文作者

Koutsoukos, Dimitrios, Müller, Ingo, Marroquín, Renato, Klimovic, Ana, Alonso, Gustavo

论文摘要

每天生产的大量数据以及数据分析的进步导致数据管理和分析系统的扩散。通常,这些系统是围绕针对基础硬件优化的高度专业的整体操作员构建的。虽然在短期内有效,但这种方法使操作员繁琐端口和适应,这是由于算法和硬件发展的速度而越来越多的。为了解决此限制,我们提出了模块化,这是基于子操作员的数据分析的执行层,即相类似于传统数据库操作员但粒度更细的构建块。为了证明我们方法的优势,我们使用模块化来构建分布式查询处理系统,该系统支持在RDMA群集,无服务器云平台和智能存储引擎上运行的关系查询。模块化需要最小的代码更改才能在这三个不同的硬件平台上执行查询,这表明子操作员方法减少了代码的数量和复杂性。实际上,平台的变化仅影响依赖基础硬件的子操作员。我们通过将模块化的端到端性能与SQL处理(PRESTO),商业群集数据库(SINGLESTORE)以及查询AS-A-Service Systems(Athena,BigQuery)进行比较,以显示模块化的端到端性能。模块化的表现优于所有这些系统,证明可以在不降低性能的情况下实现模块化设计的设计和架构优势。我们还将模块化与用于RDMA群集的连接的手工优化实现。我们表明,模块化的优点是,通过查询易于扩展到更广泛的联接变体和组范围,所有这些都在手工调整的联接中不支持所有这些。

The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis, an execution layer for data analytics based on sub-operators, i.e.,composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart storage engine. Modularis requires minimal code changes to execute queries across these three diverse hardware platforms, showing that the sub-operator approach reduces the amount and complexity of the code. In fact, changes in the platform affect only sub-operators that depend on the underlying hardware. We show the end-to-end performance of Modularis by comparing it with a framework for SQL processing (Presto), a commercial cluster database (SingleStore), as well as Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these systems, proving that the design and architectural advantages of a modular design can be achieved without degrading performance. We also compare Modularis with a hand-optimized implementation of a join for RDMA clusters. We show that Modularis has the advantage of being easily extensible to a wider range of join variants and group by queries, all of which are not supported in the hand-tuned join.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源