刺激排名：分类数据以衡量和减轻偏见

论文标题

刺激排名：分类数据以衡量和减轻偏见

Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases

论文作者

Moayeri, Mazda, Wang, Wenxiao, Singla, Sahil, Feizi, Soheil

论文摘要

我们提出了一种简单但有效的方法，用于测量和减轻依赖虚假提示引起的模型偏差。我们的方法不需要对数据或模型培训进行昂贵的更改，而是通过对它们进行分类来更好地利用已经拥有的数据。具体而言，我们基于刺激性（存在常见的伪造线索的程度）在其类别中进行对图像进行对，并通过可解释网络的深神经特征来代理。有了刺激性排名，很容易识别少数群体亚群（即低弹性图像），并评估模型偏置为高和低弹性图像之间准确性的差距。甚至可以通过对低弹性图像的分类头来有效地以几乎没有成本来有效地消除模型的偏差，从而导致对样品的更公平处理，而不管刺激性如何。我们在ImageNet上演示了我们的方法，注释了$ 5000 $ clent-feature依赖项（我们发现的$ 630 $，其中的$ 630 $），并为这些功能沿途生成了$ 325K $软切片的数据集。通过通过确定的虚假神经特征计算出弹性排名，我们以89美元的不同模型评估偏见，并发现班级偏见在模型之间高度相关。我们的结果表明，由于模型依赖的模型偏差，模型对模型的影响远大于对训练的方式的影响。

We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating $5000$ class-feature dependencies ($630$ of which we find to be spurious) and generating a dataset of $325k$ soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for $89$ diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.

下载PDF全文

下载文献需遵守相关版权规定

论文标题