具有影响功能的模型特异性数据子采样

论文标题

具有影响功能的模型特异性数据子采样

Model-specific Data Subsampling with Influence Functions

论文作者

Raj, Anant, Musco, Cameron, Mackey, Lester, Fusi, Nicolo

论文摘要

模型选择需要在给定数据集上重复评估模型并测量其相对性能。在机器学习的现代应用中，所考虑的模型评估越来越昂贵，并且感兴趣的数据集的规模越来越高。结果，模型选择的过程是耗时的，计算效率低下。在这项工作中，我们制定了一种特定于模型的数据亚采样策略，每当培训点具有不同的影响时，该策略会在随机抽样中进行改进。具体而言，我们利用影响力的功能来指导我们的选择策略，从理论上证明，并在经验上证明我们的方法迅速选择了高质量的模型。

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题