Testsgd：针对微妙群体歧视的神经网络的可解释测试

论文标题

Testsgd：针对微妙群体歧视的神经网络的可解释测试

TESTSGD: Interpretable Testing of Neural Networks Against Subtle Group Discrimination

论文作者

Zhang, Mengdi, Sun, Jun, Wang, Jingyi, Sun, Bing

论文摘要

在许多机器学习应用中已经显示了歧视，该应用程序要求在与伦理相关的领域（例如面部识别，医学诊断和刑事判决）中部署之前进行足够的公平测试。现有的公平测试方法主要设计用于识别个人歧视，即对个人的歧视。然而，作为另一种广泛的歧视类型，对群体歧视（大多数隐藏）的测试的研究要少得多。为了解决差距，在这项工作中，我们提出了一种可解释的测试方法，一种可解释的测试方法，它可以系统地识别和措施（我们称之为“微妙的”群体歧视}的神经网络的“微妙的”群体区分}，其特征是敏感特征的组合，特别是给定神经网络的敏感特征的组合，对一个自动启发了一个模型的统治，该群体将分别构建一个模型，以分类为单位的dive dutput nigit nigit nigit nigit nigit nigit nigit nigit nigit nigt of tude dutput。除了测试输入空间以测量确定的群体歧视的程度，确保估计的群体公平性得分与误差相关，我们评估了误差，我们评估了测试的多个神经网络模型，这是在流行数据集中训练的，这两个均未识别，该测试结果是有效的。此外。

Discrimination has been shown in many machine learning applications, which calls for sufficient fairness testing before their deployment in ethic-relevant domains such as face recognition, medical diagnosis and criminal sentence. Existing fairness testing approaches are mostly designed for identifying individual discrimination, i.e., discrimination against individuals. Yet, as another widely concerning type of discrimination, testing against group discrimination, mostly hidden, is much less studied. To address the gap, in this work, we propose TESTSGD, an interpretable testing approach which systematically identifies and measures hidden (which we call `subtle' group discrimination} of a neural network characterized by conditions over combinations of the sensitive features. Specifically, given a neural network, TESTSGDfirst automatically generates an interpretable rule set which categorizes the input space into two groups exposing the model's group discrimination. Alongside, TESTSGDalso provides an estimated group fairness score based on sampling the input space to measure the degree of the identified subtle group discrimination, which is guaranteed to be accurate up to an error bound. We evaluate TESTSGDon multiple neural network models trained on popular datasets including both structured data and text data. The experiment results show that TESTSGDis effective and efficient in identifying and measuring such subtle group discrimination that has never been revealed before. Furthermore, we show that the testing results of TESTSGDcan guide generation of new samples to mitigate such discrimination through retraining with negligible accuracy drop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题