论文标题
通过对抗性属性的公平性邻里健壮学习
Fairness via Adversarial Attribute Neighbourhood Robust Learning
论文作者
论文摘要
提高特权和较易敏感的敏感属性组(例如{race,gender})之间的公平性吸引了很多关注。为了增强模型在不同的敏感属性中的表现均匀,我们提出了一个原理\下划线{r} obust \ usewissline {a} dversarial \ descarial \ undersline {a} ttribute \ ttribute \ useverline {n} eighbourhood(eighbourhood(raan)损失对分类的损失,分类范围跨越了跨敏感属性的分类,并促进了众所周知的代表性分布。 Raan的关键思想是通过将每个样本分配一个对抗性稳健的权重来减轻不同敏感属性组之间偏见表示的差异,该样本是根据对抗性属性邻居的表示定义的,即来自不同受保护组的样本。为了提供有效的优化算法,我们将RAAN投入了耦合组成函数之和,并提出了具有可证明的理论保证的随机自适应(ADAM风格)和非适应性(SGD风格)算法SCRAAN SCRAAN。关于公平相关基准数据集的广泛实证研究验证了所提出的方法的有效性。
Improving fairness between privileged and less-privileged sensitive attribute groups (e.g, {race, gender}) has attracted lots of attention. To enhance the model performs uniformly well in different sensitive attributes, we propose a principled \underline{R}obust \underline{A}dversarial \underline{A}ttribute \underline{N}eighbourhood (RAAN) loss to debias the classification head and promote a fairer representation distribution across different sensitive attribute groups. The key idea of RAAN is to mitigate the differences of biased representations between different sensitive attribute groups by assigning each sample an adversarial robust weight, which is defined on the representations of adversarial attribute neighbors, i.e, the samples from different protected groups. To provide efficient optimization algorithms, we cast the RAAN into a sum of coupled compositional functions and propose a stochastic adaptive (Adam-style) and non-adaptive (SGD-style) algorithm framework SCRAAN with provable theoretical guarantee. Extensive empirical studies on fairness-related benchmark datasets verify the effectiveness of the proposed method.