论文标题
“您的算法无关紧要的见证人”:对单词嵌入的多类伪证方法的评估
"Thy algorithm shalt not bear false witness": An Evaluation of Multiclass Debiasing Methods on Word Embeddings
论文作者
论文摘要
随着人工智能应用的巨大发展和使用,对这些算法的公平性的研究已经增加。具体而言,在自然语言处理领域中,已经表明,社会偏见持续存在在单词嵌入中,因此在使用时会放大这些偏见的危险。作为社会偏见的一个例子,宗教偏见被证明在单词嵌入中持续存在,并强调了其删除的需求。本文调查了最先进的多类辩护技术:硬性偏见,软脚上的依据和概念者的偏见。它通过通过嵌入嵌入关联测试(WEAT),平均平均余弦相似性(MAC)和相对负面情绪偏见(RNSB)来量化偏差来消除宗教偏见时评估他们的表现。通过调查三个广泛使用的单词嵌入的宗教偏见,即:word2vec,手套和概念网,可以表明,首选方法是概念上的。具体而言,该技术设法将三个单词嵌入集的宗教偏见平均减少82,42%,96,78%和54,76%。
With the vast development and employment of artificial intelligence applications, research into the fairness of these algorithms has been increased. Specifically, in the natural language processing domain, it has been shown that social biases persist in word embeddings and are thus in danger of amplifying these biases when used. As an example of social bias, religious biases are shown to persist in word embeddings and the need for its removal is highlighted. This paper investigates the state-of-the-art multiclass debiasing techniques: Hard debiasing, SoftWEAT debiasing and Conceptor debiasing. It evaluates their performance when removing religious bias on a common basis by quantifying bias removal via the Word Embedding Association Test (WEAT), Mean Average Cosine Similarity (MAC) and the Relative Negative Sentiment Bias (RNSB). By investigating the religious bias removal on three widely used word embeddings, namely: Word2Vec, GloVe, and ConceptNet, it is shown that the preferred method is ConceptorDebiasing. Specifically, this technique manages to decrease the measured religious bias on average by 82,42%, 96,78% and 54,76% for the three word embedding sets respectively.