论文标题
Trimtuner:通过子采样有效地优化云中的机器学习工作
TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling
论文作者
论文摘要
这项工作介绍了Trimtuner,这是第一个在云中优化机器学习作业的系统,以利用子采样技术,以降低优化过程的成本,同时考虑用户指定的约束。 Trimtuner共同优化了云和特定于应用程序的参数,并且与最先进的云优化状态不同,每次采样新的配置时,都需要使用完整的训练组来训练模型。实际上,通过利用比原始的次采样技术和数据集高达60倍的数据集,我们表明Trimtuner可以将优化过程的成本降低到最高50倍。此外,对于使用子采样技术的高参数优化,对最新技术的状态技术将建议过程加快了65倍的速度。这种改进的原因是双重的:i)一种新型的特定领域启发式,可减少必须评估采集函数的配置数量; ii)采用一组决策树,使推荐过程的速度增加一个数量级。
This work introduces TrimTuner, the first system for optimizing machine learning jobs in the cloud to exploit sub-sampling techniques to reduce the cost of the optimization process while keeping into account user-specified constraints. TrimTuner jointly optimizes the cloud and application-specific parameters and, unlike state of the art works for cloud optimization, eschews the need to train the model with the full training set every time a new configuration is sampled. Indeed, by leveraging sub-sampling techniques and data-sets that are up to 60x smaller than the original one, we show that TrimTuner can reduce the cost of the optimization process by up to 50x. Further, TrimTuner speeds-up the recommendation process by 65x with respect to state of the art techniques for hyper-parameter optimization that use sub-sampling techniques. The reasons for this improvement are twofold: i) a novel domain specific heuristic that reduces the number of configurations for which the acquisition function has to be evaluated; ii) the adoption of an ensemble of decision trees that enables boosting the speed of the recommendation process by one additional order of magnitude.