论文标题
汽车系统评估的任务选择
Task Selection for AutoML System Evaluation
论文作者
论文摘要
我们的目标是评估汽车系统是否更改(即搜索空间或超参数优化)将改善最终模型在生产任务上的性能。但是,我们无法测试生产任务的更改。取而代之的是,我们只能访问有关AutoML系统先前执行的任务的有限描述符,例如数据点或功能的数量。我们还拥有一组开发任务,用于测试更改,例如,从OpenML取样,没有使用限制。但是,开发和生产任务分布不同,导致我们追求只能改善发展而不是生产的变化。本文提出了一种利用有关汽车生产任务的描述符信息的方法,以选择最相关开发任务的过滤子集。实证研究表明,我们的过滤策略提高了评估与开发不同分布不同任务的汽车系统更改的能力。
Our goal is to assess if AutoML system changes - i.e., to the search space or hyperparameter optimization - will improve the final model's performance on production tasks. However, we cannot test the changes on production tasks. Instead, we only have access to limited descriptors about tasks that our AutoML system previously executed, like the number of data points or features. We also have a set of development tasks to test changes, ex., sampled from OpenML with no usage constraints. However, the development and production task distributions are different leading us to pursue changes that only improve development and not production. This paper proposes a method to leverage descriptor information about AutoML production tasks to select a filtered subset of the most relevant development tasks. Empirical studies show that our filtering strategy improves the ability to assess AutoML system changes on holdout tasks with different distributions than development.