论文标题

(非) - 刻板印象类别的(非)临床单词嵌入性的性别偏见

Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

论文作者

Sogancioglu, Gizem, Mijsters, Fabian, van Uden, Amar, Peperzak, Jelle

论文摘要

临床单词嵌入在各种生物-NLP问题中广泛使用,作为最先进的特征矢量表示。尽管它们在单词的语义表示上取得了很大的成功,但由于数据集(可能带有统计和社会偏见),他们受过培训的统计和社会偏见,因此它们可能表现出性别刻板印象。这项研究分析了三种医学类别的临床嵌入性别偏见:精神障碍,性传播疾病和人格特征。在此范围内,我们分析了两个不同的预训练的嵌入,即(上下文化的)临床 - bert和(非上下文)Biowordvec。我们表明,这两个嵌入都偏向敏感的性别群体,但Biowordvec在所有这三个类别中表现出比临床 - 伯特的偏见更高。此外,我们的分析表明,临床嵌入对某些医学术语和疾病的高度偏见,这与医学文献相抵触。拥有如此不基调的关系可能会在使用临床嵌入的下游应用中造成伤害。

Clinical word embeddings are extensively used in various Bio-NLP problems as a state-of-the-art feature vector representation. Although they are quite successful at the semantic representation of words, due to the dataset - which potentially carries statistical and societal bias - on which they are trained, they might exhibit gender stereotypes. This study analyses gender bias of clinical embeddings on three medical categories: mental disorders, sexually transmitted diseases, and personality traits. To this extent, we analyze two different pre-trained embeddings namely (contextualized) clinical-BERT and (non-contextualized) BioWordVec. We show that both embeddings are biased towards sensitive gender groups but BioWordVec exhibits a higher bias than clinical-BERT for all three categories. Moreover, our analyses show that clinical embeddings carry a high degree of bias for some medical terms and diseases which is conflicting with medical literature. Having such an ill-founded relationship might cause harm in downstream applications that use clinical embeddings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源