论文标题

与Apache Hadoop和Apache Spark的查询计划推荐的性能评估

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

论文作者

Azhir, Elham, Hosseinzadeh, Mehdi, Khan, Faheem, Mosavi, Amir

论文摘要

访问计划建议是一种查询优化方法,该方法使用先前创建的查询执行计划(QEPS)执行新查询。查询优化器将查询空间分为上述方法中的簇。但是,传统的聚类算法需要大量的执行时间来集群如此大的数据集。 MAPREDUCE分布式计算模型为存储和处理大量数据提供了有效的解决方案。在本研究中使用了Apache Spark和Apache Hadoop框架,以在基于MapReduce的访问计划推荐方法中聚集不同尺寸的查询数据集。绩效评估是根据执行时间执行的。实验的结果证明了并行查询聚类在实现高扩展性方面的有效性。此外,Apache Spark的性能比Apache Hadoop更好,平均速度为2倍。

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the MapReduce-based access plan recommendation method. The performance evaluation is performed based on execution time. The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability. Furthermore, Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源