论文标题
一个基于大数据的框架,用于通过COVID-19数据集执行复杂查询(COVID-QF)
A Big Data Based Framework for Executing Complex Query Over COVID-19 Datasets (COVID-QF)
论文作者
论文摘要
Covid-19的快速全球差异为大数据分析提供了创新的工具。这些指导了卫生行业各个领域的组织,以跟踪和最大程度地减少病毒的影响。需要研究人员通过人工智能,机器学习和自然语言处理来检测冠状病毒,并对疾病有完整的了解。 Covid-19在世界上不同国家 /地区进行,仅大数据应用程序和NOSQL数据库的工作是合适的。有很多平台用于处理NOSQL数据库模型,例如:Spark,H2O和Hadoop HDFS/MAPREDUCE,它们适合控制和管理大量数据。大型应用程序程序员面临的许多挑战,尤其是那些通过混合数据模型通过不同的API和查询在COVID-19数据库上工作的挑战。在这种情况下,本文提出了一个存储框架,以处理COVID-19数据集的SQL和NOSQL数据库(COVID-QF),以治疗和处理通过减少治疗时间在全球范围内传播的病毒引起的问题。对于NOSQL数据库,COVID-QF使用Hadoop HDFS/MAP REDAD和APACHE SPARK。 COVID-QF由三层组成:数据收集层,存储层和查询处理层。数据收集在数据收集层中。存储层将数据划分为数据节省和处理块的收集,并将Spark的连接器与不同的数据库引擎连接起来,以减少节省和检索的时间。当处理层执行请求查询并发送结果时。提出的框架使用了三个数据集增加了COVID-19数据(Covid-19-Merging,Covid-19-Misside-Hubei和Covid-19-Ex-Hubei)的时间,以测试这项研究的实验。获得的结果确保了Covid-QF框架的优越性。
COVID-19's rapid global spread has driven innovative tools for Big Data Analytics. These have guided organizations in all fields of the health industry to track and minimized the effects of virus. Researchers are required to detect coronaviruses through artificial intelligence, machine learning, and natural language processing, and to gain a complete understanding of the disease. COVID-19 takes place in different countries in the world, with which only big data application and the work of NOSQL databases are suitable. There is a great number of platforms used for processing NOSQL Databases model like: Spark, H2O and Hadoop HDFS/MapReduce, which are proper to control and manage the enormous amount of data. Many challenges faced by large applications programmers, especially those that work on the COVID-19 databases through hybrid data models through different APIs and query. In this context, this paper proposes a storage framework to handle both SQL and NOSQL databases named (COVID-QF) for COVID-19 datasets in order to treat and handle the problems caused by virus spreading worldwide by reducing treatment times. In case of NoSQL database, COVID-QF uses Hadoop HDFS/Map Reduce and Apache Spark. The COVID-QF consists of three Layers: data collection layer, storage layer, and query Processing layer. The data is collected in the data collection layer. The storage layer divides data into collection of data-saving and processing blocks, and it connects the Connector of the spark with different databases engine to reduce time of saving and retrieving. While the Processing layer executes the request query and sends results. The proposed framework used three datasets increased for time for COVID-19 data (COVID-19-Merging, COVID-19-inside-Hubei and COVID-19-ex-Hubei) to test experiments of this study. The results obtained insure the superiority of the COVID-QF framework.