论文标题
算法公平的因果特征选择
Causal Feature Selection for Algorithmic Fairness
论文作者
论文摘要
在高风险社会决策中使用机器学习(ML)鼓励整个ML生命周期中考虑公平性。尽管数据集成是生成高质量培训数据的主要步骤之一,但大多数公平文献都忽略了这一阶段。在这项工作中,我们考虑数据管理集成部分中的公平性,旨在确定可以改善预测的功能,而不会增加数据集的任何偏见。我们在因果介入公平范式下工作。在不需要基本的结构因果模型的情况下,我们提出了一种方法来识别特征的子收集,从而通过在特征的不同子集之间执行条件独立性测试来确保数据集的公平性。我们使用小组测试来改善方法的复杂性。从理论上讲,我们证明了所提出的算法的正确性,以确定确保介入公平性的特征,并表明亚线性条件独立性测试足以识别这些变量。对现实世界数据集进行了详细的经验评估,以证明我们技术的功效和效率。
The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.