论文标题
操作员作为服务:无态无服务器复杂事件处理
Operator as a Service: Stateful Serverless Complex Event Processing
论文作者
论文摘要
复杂的事件处理(CEP)是可扩展数据管理的强大范式,在许多现实世界中使用,例如检测银行中的信用卡欺诈。所谓的复杂事件使用通常在特定运行时系统上实现和执行的规范语言表示。尽管这两个组件的紧密耦合被认为是在高性能下支持CEP的关键,但这种依赖性构成了以下几个固有的挑战。 (1)在CEP系统上的应用程序开发需要广泛了解运行时系统的运行方式,这通常在本质上是高度复杂的。 (2)规范语言依赖性需要域专家的需求,并进一步限制并为应用程序开发人员提供学习曲线。在本文中,我们提出了一种可扩展的数据管理系统Cepless,该系统通过基于无服务器计算的原理来将规范从运行时系统中分解。 Cepless将操作员作为服务提供,并通过在任何规范语言中启用CEP应用程序的开发,同时抽象CEP运行时系统的复杂性来提供灵活性。作为Cepless的一部分,我们设计和评估了新颖的机制,用于内存处理和批处理,即使在高摄入的事件发生率下,也可以实现CEP操作员的状态处理。我们的评估表明,Cepless可以轻松地集成到现有的CEP系统中,例如Apache Flink,同时在大规模的事件(每秒最多100K事件)下获得相似的吞吐量,并在238毫秒内获得动态操作员的更新。
Complex Event Processing (CEP) is a powerful paradigm for scalable data management that is employed in many real-world scenarios such as detecting credit card fraud in banks. The so-called complex events are expressed using a specification language that is typically implemented and executed on a specific runtime system. While the tight coupling of these two components has been regarded as the key for supporting CEP at high performance, such dependencies pose several inherent challenges as follows. (1) Application development atop a CEP system requires extensive knowledge of how the runtime system operates, which is typically highly complex in nature. (2) The specification language dependence requires the need of domain experts and further restricts and steepens the learning curve for application developers. In this paper, we propose CEPLESS, a scalable data management system that decouples the specification from the runtime system by building on the principles of serverless computing. CEPLESS provides operator as a service and offers flexibility by enabling the development of CEP application in any specification language while abstracting away the complexity of the CEP runtime system. As part of CEPLESS, we designed and evaluated novel mechanisms for in-memory processing and batching that enables the stateful processing of CEP operators even under high rates of ingested events. Our evaluation demonstrates that CEPLESS can be easily integrated into existing CEP systems like Apache Flink while attaining similar throughput under a high scale of events (up to 100K events per second) and dynamic operator update in up to 238 ms.