基准在一系列合成分类问题上对汽车算法进行基准测试

论文标题

基准在一系列合成分类问题上对汽车算法进行基准测试

Benchmarking AutoML algorithms on a collection of synthetic classification problems

论文作者

Ribeiro, Pedro Henrique, Orzechowski, Patryk, Wagenaar, Joost, Moore, Jason H.

论文摘要

自动化的机器学习（AUTOML）算法由于其高性能和适应不同问题和数据集的灵活性而越来越受欢迎。随着汽车算法数量的越来越多，决定哪些最适合给定问题的工作变得越来越多。因此，必须使用复杂且具有挑战性的基准，这将能够彼此区分自动算法。本文比较了四种不同的自动算法的性能：基于树的管道优化工具（TPOT），自动扫描，自动 - 扫描2和H2O Automl。我们使用多样化和生成的ML基准（Digen），这是一套来自生成功能的合成数据集，旨在突出通用机器学习算法的性能的优势和缺点。我们确认Automl可以识别在所有包含数据集上表现良好的管道。大多数汽车算法的性能类似。但是，根据使用的特定数据集和指标存在一些差异。

Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging benchmarks which would be able to differentiate the AutoML algorithms from each other. This paper compares the performance of four different AutoML algorithms: Tree-based Pipeline Optimization Tool (TPOT), Auto-Sklearn, Auto-Sklearn 2, and H2O AutoML. We use the Diverse and Generative ML benchmark (DIGEN), a diverse set of synthetic datasets derived from generative functions designed to highlight the strengths and weaknesses of the performance of common machine learning algorithms. We confirm that AutoML can identify pipelines that perform well on all included datasets. Most AutoML algorithms performed similarly; however, there were some differences depending on the specific dataset and metric used.

下载PDF全文

下载文献需遵守相关版权规定

论文标题