受保护的属性告诉我们谁，行为告诉我们：如何比较人口统计学和行为过采样的公平学生成功建模

论文标题

受保护的属性告诉我们谁，行为告诉我们：如何比较人口统计学和行为过采样的公平学生成功建模

Protected Attributes Tell Us Who, Behavior Tells Us How: A Comparison of Demographic and Behavioral Oversampling for Fair Student Success Modeling

论文作者

Cock, Jade Maï, Bilal, Muhammad, Davis, Richard, Marras, Mirko, Käser, Tanja

论文摘要

部署在教育中的算法可以塑造学生的学习经验和成功。因此，重要的是要了解这种算法是否以及如何造成不平等或扩大现有偏见。在本文中，我们分析了使用行为数据来识别高危学生的模型的公平性，并提出了两种新颖的减轻偏见预处理方法。基于相交性的概念，第一种方法涉及对人口属性组合的智能过度采样。第二种方法不需要任何人口统计学属性的知识，而是基于以下假设：这种属性是学生行为的（嘈杂）代理。因此，我们建议直接超过聚类分析中确定的不同类型的行为。我们评估了（i）开放式学习环境和（ii）翻转课堂课程的数据的方法。我们的结果表明，两种方法都可以减轻模型偏差。当不可用的人口元数据时，直接对行为进行过度采样是一种有价值的选择。源代码和扩展结果在https://github.com/epfl-ml4ed/behavioral-oversampling} {https://github.com/epfl-ml4ed/behavioral-oversmpling中提供。

Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling}{https://github.com/epfl-ml4ed/behavioral-oversampling .

下载PDF全文

下载文献需遵守相关版权规定

论文标题