论文标题
NDN-TR70-利用NDN-DPDK进行Kubernetes基因组数据湖
NDN-TR70 -- Utilizing NDN-DPDK for Kubernetes Genomics Data Lake
论文作者
论文摘要
由于基因组学样本的增长由于增加了获得高分辨率DNA测序技术而迅速扩展,因此需要一个可扩展平台汇总分散数据集的需求使得可以轻松访问可用的大量DNA序列。在这项工作中,我们介绍并演示了一种新颖的方式,并与Kubernetes群集结合使用了命名数据网络(NDN),以设计云中灵活且可扩展的基因组数据湖。此外,NDN数据平面开发套件(DPDK)的使用为研究人员提供了有效且易于访问的数据集。该报告将解释需要部署数据湖以获取基因组学数据,成功部署的必要条件以及复制所提出的设计的详细说明。最后,该技术报告概述了未来的增强选择,以进一步改进。
As the growth of genomics samples rapidly expands due to increased access to high resolution DNA sequencing technology, the need for a scalable platform to aggregate dispersed datasets enable easy access to the vast wealth of DNA sequences available is paramount. In this work, we introduce and demonstrate a novel way to use Named Data Networking (NDN) in conjunction with a Kubernetes cluster to design a flexible and scalable genomics Data Lake in the cloud. In addition, the use of the NDN Data Plane Development Kit (DPDK) provides an efficient and accessible distribution of the datasets to researchers anywhere. This report will explain the need to deploy a Data Lake for genomics data, what is necessary to deploy successfully, and detailed instructions to replicate the proposed design. Finally, this technical report outlines future enhancement options for further improvements.