论文标题
用于可解释和公平表示的无效采样
Null-sampling for Interpretable and Fair Representations
论文作者
论文摘要
我们建议在数据域中学习不变的表示,以实现算法公平性的解释性。不变性意味着对高水平,相关相关性W.R.T.的选择性类标签注释,以及与种族或性别等受保护特征的无关相关性。我们引入了一种非平凡的设置,其中训练集表现出强烈的偏见,因此类标签注释是无关紧要的,而虚假的相关性无法区分。为了解决此问题,我们引入了一个经过对抗训练的模型,并采用了空采样过程,以在数据域中产生不变表示。为了实现解开,使用了部分标记的代表集。通过将表示形式置于数据域,模型所做的更改很容易由人类审计师检查。我们显示了我们方法对图像和表格数据集的有效性:彩色MNIST,CELEBA和成人数据集。
We propose to learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness. Invariance implies a selectivity for high level, relevant correlations w.r.t. class label annotations, and a robustness to irrelevant correlations with protected characteristics such as race or gender. We introduce a non-trivial setup in which the training set exhibits a strong bias such that class label annotations are irrelevant and spurious correlations cannot be distinguished. To address this problem, we introduce an adversarially trained model with a null-sampling procedure to produce invariant representations in the data domain. To enable disentanglement, a partially-labelled representative set is used. By placing the representations into the data domain, the changes made by the model are easily examinable by human auditors. We show the effectiveness of our method on both image and tabular datasets: Coloured MNIST, the CelebA and the Adult dataset.