论文标题

COVID-19的可扩展管道:德国,捷克和波兰的案例研究

A scalable pipeline for COVID-19: the case study of Germany, Czechia and Poland

论文作者

Abdussalam, Wildan, Mertel, Adam, Fan, Kai, Schüler, Lennart, Schlechte-Wełnicz, Weronika, Calabrese, Justin M.

论文摘要

在整个2019年冠状病毒疾病(Covid-19)中,决策者依靠预测模型来确定和实施非药物干预措施(NPI)。在构建预测模型时,需要从包括开发人员,分析师和测试人员在内的各种利益相关者进行不断更新的数据集,以提供精确的预测。在这里,我们报告了可扩展管道的设计,该管道可作为数据同步,以支持国际自上而下的时空时空观测和VOR的预测模型,即Covid-19,命名为“ Where2test”,用于德国,捷克西亚和波兰。我们已经使用PostgreSQL构建了一个操作数据存储(ODS),以连续合并多个数据源的数据集,执行协作工作,促进高性能数据分析和跟踪更改。 ODS不仅是为了存储来自德国,捷比亚和波兰的COVID-19数据,而且还存储了其他领域。元数据的模式采用维数事实模型,能够同步这些区域的各种数据结构,并且可以扩展到整个世界。接下来,使用批处理,转移和负载(ETL)作业填充ODS。随后创建了SQL查询,以减少为用户预处理数据的需求。然后,数据不仅可以支持使用版本控制的Arima-Holt模型和其他分析来预测,以支持决策制定,还可以风险计算器和优化应用程序。数据同步以每天的间隔运行,该间隔显示在https://www.where2test.de上。

Throughout the coronavirus disease 2019 (COVID-19) pandemic, decision makers have relied on forecasting models to determine and implement non-pharmaceutical interventions (NPI). In building the forecasting models, continuously updated datasets from various stakeholders including developers, analysts, and testers are required to provide precise predictions. Here we report the design of a scalable pipeline which serves as a data synchronization to support inter-country top-down spatiotemporal observations and forecasting models of COVID-19, named the where2test, for Germany, Czechia and Poland. We have built an operational data store (ODS) using PostgreSQL to continuously consolidate datasets from multiple data sources, perform collaborative work, facilitate high performance data analysis, and trace changes. The ODS has been built not only to store the COVID-19 data from Germany, Czechia, and Poland but also other areas. Employing the dimensional fact model, a schema of metadata is capable of synchronizing the various structures of data from those regions, and is scalable to the entire world. Next, the ODS is populated using batch Extract, Transfer, and Load (ETL) jobs. The SQL queries are subsequently created to reduce the need for pre-processing data for users. The data can then support not only forecasting using a version-controlled Arima-Holt model and other analyses to support decision making, but also risk calculator and optimisation apps. The data synchronization runs at a daily interval, which is displayed at https://www.where2test.de.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源