具有参数可能性比的分布稳健模型

论文标题

具有参数可能性比的分布稳健模型

Distributionally Robust Models with Parametric Likelihood Ratios

论文作者

Michel, Paul, Hashimoto, Tatsunori, Neubig, Graham

论文摘要

随着机器学习模型的部署越来越广泛，越来越重要的是，它们不仅能够在训练分配上表现良好，而且在面对分配变化时也会产生准确的预测。分布强大的优化（DRO）框架提议通过培训模型来解决此问题，以最大程度地降低其在分布集合中的预期风险，以模仿测试时间变化。这通常是通过实例级的训练目标重新加权来实现的，以模拟可能的测试分布，从而通过重要性抽样来估算其经验风险（假设它们是训练分布的亚群）。但是，文献中的重新加权方案通常受到限制，因为难以保持优化问题可处理以及强制执行归一化约束的复杂性。在本文中，我们表明，三个简单的想法 - 微型批次级别的归一化，kl惩罚和同时梯度更新 - 允许我们使用更广泛的参数可能性比例训练DRO训练模型。在一系列关于图像和文本分类基准的实验中，我们发现，与其他DRO方法相比，使用所得参数对手训练的模型对亚群的变化始终更强，并且该方法的性能在很少的高参数调音方面可靠地可靠。可以在https://github.com/pmichel31415/p-dro上找到复制我们实验的代码。

As machine learning models are deployed ever more broadly, it becomes increasingly important that they are not only able to perform well on their training distribution, but also yield accurate predictions when confronted with distribution shift. The Distributionally Robust Optimization (DRO) framework proposes to address this issue by training models to minimize their expected risk under a collection of distributions, to imitate test-time shifts. This is most commonly achieved by instance-level re-weighting of the training objective to emulate the likelihood ratio with possible test distributions, which allows for estimating their empirical risk via importance sampling (assuming that they are subpopulations of the training distribution). However, re-weighting schemes in the literature are usually limited due to the difficulty of keeping the optimization problem tractable and the complexity of enforcing normalization constraints. In this paper, we show that three simple ideas -- mini-batch level normalization, a KL penalty and simultaneous gradient updates -- allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and text classification benchmarks, we find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches, and that the method performs reliably well with little hyper-parameter tuning. Code to reproduce our experiments can be found at https://github.com/pmichel31415/P-DRO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题