论文标题
公平性通过过度参数化制度中的核算:一个警示性的故事
Fairness via In-Processing in the Over-parameterized Regime: A Cautionary Tale
论文作者
论文摘要
DNN的成功是由过度参数化网络概括的违反直觉能力驱动的,即使它们完全适合培训数据。实际上,测试误差通常会随着过度参数化的增加而继续减少,称为双重下降。这使从业者可以实例化大型模型,而不必担心过度合适。然而,尽管有好处,但先前的工作表明,过度参数会加剧偏见对少数族裔亚组的偏见。已经提出了几种公平约束的DNN培训方法来解决这一问题。在这里,我们批判性地检查了Mindiff,这是在Tensorflow负责的AI工具包中实施的公平约束培训程序,旨在实现机会平等。我们表明,尽管Mindiff改善了参数较少的模型的公平性,但在过度参数化的制度中可能是无效的。这是因为在训练数据上,零训练损失的过度适合模型在训练数据上是微不足道的,创造了“公平性的幻想”,因此可以关闭Mindiff优化(这适用于任何基于差异的措施,这些措施关心错误或准确性。它不适用于人群奇偶校验)。在指定的公平限制内,与参数过度的同行相比,参数较少的Mindiff模型甚至可能具有较低的错误(尽管基线过度参数化模型的误差较低)。我们进一步表明,Mindiff优化对在参数不足的制度中的批处理大小非常敏感。因此,使用Mindiff的公平模型培训需要耗时的超参数搜索。最后,我们建议使用先前提出的正则化技术,即。 L2,与Mindiff结合的早期停止和洪水,以训练公平的过度参数化模型。
The success of DNNs is driven by the counter-intuitive ability of over-parameterized networks to generalize, even when they perfectly fit the training data. In practice, test error often continues to decrease with increasing over-parameterization, referred to as double descent. This allows practitioners to instantiate large models without having to worry about over-fitting. Despite its benefits, however, prior work has shown that over-parameterization can exacerbate bias against minority subgroups. Several fairness-constrained DNN training methods have been proposed to address this concern. Here, we critically examine MinDiff, a fairness-constrained training procedure implemented within TensorFlow's Responsible AI Toolkit, that aims to achieve Equality of Opportunity. We show that although MinDiff improves fairness for under-parameterized models, it is likely to be ineffective in the over-parameterized regime. This is because an overfit model with zero training loss is trivially group-wise fair on training data, creating an "illusion of fairness," thus turning off the MinDiff optimization (this will apply to any disparity-based measures which care about errors or accuracy. It won't apply to demographic parity). Within specified fairness constraints, under-parameterized MinDiff models can even have lower error compared to their over-parameterized counterparts (despite baseline over-parameterized models having lower error). We further show that MinDiff optimization is very sensitive to choice of batch size in the under-parameterized regime. Thus, fair model training using MinDiff requires time-consuming hyper-parameter searches. Finally, we suggest using previously proposed regularization techniques, viz. L2, early stopping and flooding in conjunction with MinDiff to train fair over-parameterized models.