论文标题
通过预测敏感性来衡量文本分类器的公平性
Measuring Fairness of Text Classifiers via Prediction Sensitivity
论文作者
论文摘要
随着语言处理应用程序的快速增长,公平性已成为数据驱动解决方案的重要考虑因素。尽管最近的文献中已经探讨了各种公平定义,但缺乏共识,这些共识最准确地反映了系统的公平性。在这项工作中,我们提出了一种新的公式:累积的预测灵敏度,该公式根据模型对输入特征中对扰动的预测敏感性来衡量机器学习模型中的公平性。指标试图量化单个预测取决于受保护的属性的程度,其中受保护的属性编码一个受保护组中个人的成员身份。我们表明,理论上可以将该指标与特定的团体公平概念(统计平等)和个人公平联系在一起。它也与人类对公平的看法有很好的关系。我们对两个文本分类数据集进行了实验:拼图毒性和BIOS的偏见,并评估指标与手动注释之间有关该模型是否产生公平结果的相关性。我们观察到,基于预测灵敏度的拟议公平度量与人类注释相比,与现有的反事实公平度量相比,与人类注释相关。
With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation : ACCUMULATED PREDICTION SENSITIVITY, which measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans' perception of fairness. We conduct experiments on two text classification datasets : JIGSAW TOXICITY, and BIAS IN BIOS, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.