论文标题
映射多语言边缘:英语,西班牙语和阿拉伯语的情感分析系统的交叉偏见
Mapping the Multilingual Margins: Intersectional Biases of Sentiment Analysis Systems in English, Spanish, and Arabic
论文作者
论文摘要
随着自然语言处理系统变得越来越普遍,有必要解决其实施和部署中的公平问题,以确保将其对社会的负面影响理解和最小化。但是,有限的工作有限地使用多语言和相交框架或下游任务研究公平性。在本文中,我们介绍了四个多语言股权评估公司,旨在衡量社会偏见的补充测试集,以及一个研究自然语言处理中的一致性和交叉社会偏见的新型统计框架。我们使用这些工具来衡量五个模型中的性别,种族,种族和交叉社会偏见,这些模型接受了英语,西班牙语和阿拉伯语的情绪回归任务的培训。我们发现,许多系统都表现出具有统计学意义的一致性和交叉社会偏见。
As natural language processing systems become more widespread, it is necessary to address fairness issues in their implementation and deployment to ensure that their negative impacts on society are understood and minimized. However, there is limited work that studies fairness using a multilingual and intersectional framework or on downstream tasks. In this paper, we introduce four multilingual Equity Evaluation Corpora, supplementary test sets designed to measure social biases, and a novel statistical framework for studying unisectional and intersectional social biases in natural language processing. We use these tools to measure gender, racial, ethnic, and intersectional social biases across five models trained on emotion regression tasks in English, Spanish, and Arabic. We find that many systems demonstrate statistically significant unisectional and intersectional social biases.