论文标题

视觉语义AI的明显性

Markedness in Visual Semantic AI

论文作者

Wolfe, Robert, Caliskan, Aylin

论文摘要

我们将与年龄,性别,种族或种族标记有关的偏见评估最新的多模式“视觉语义”模型剪辑(“对比语言图像”)。如果可以选择将图像标记为“一个人的照片”或选择表示种族或种族的标签,则剪辑选择“人”标签为白人47.9%,而白人则为5.0%或更少,而黑人,东亚,东南亚,印第安人,印度,印第安人或拉丁裔或拉美裔则为5.0%或更少。该模型更有可能将未标记的“人”标签排名高于表示男性(26.7%的时间)与女性(占15.2%的时间)的标签。年龄影响个人是否以模型为标志:20岁以下的女性比男性更有可能标记有性别标签,但具有年龄标签的可能性较小,而40岁以上的女性则比男性更有可能标记。我们还研究了每个社会群体的自相似性(平均成对余弦相似性),在这些社会群体中,较高的自相似性表示剪辑对社会群体的共同特征(年龄,种族或性别)的关注。随着年龄的增长,女性表征的自相似性的增加比男性个体高,其差异最为明显,在“超过70”的年龄范围内。最相似的社会群体中的所有十个都是10岁以下或70岁以上的人,十个年龄六岁是女性。当比较的群体是白人和男性和黑人和女性的个体时,男性和女性群体之间的自相似性和标志性的现有偏见会进一步加剧。结果表明,剪辑反映了产生培训数据的语言和社会的偏见。

We evaluate the state-of-the-art multimodal "visual semantic" model CLIP ("Contrastive Language Image Pretraining") for biases related to the marking of age, gender, and race or ethnicity. Given the option to label an image as "a photo of a person" or to select a label denoting race or ethnicity, CLIP chooses the "person" label 47.9% of the time for White individuals, compared with 5.0% or less for individuals who are Black, East Asian, Southeast Asian, Indian, or Latino or Hispanic. The model is more likely to rank the unmarked "person" label higher than labels denoting gender for Male individuals (26.7% of the time) vs. Female individuals (15.2% of the time). Age affects whether an individual is marked by the model: Female individuals under the age of 20 are more likely than Male individuals to be marked with a gender label, but less likely to be marked with an age label, while Female individuals over the age of 40 are more likely to be marked based on age than Male individuals. We also examine the self-similarity (mean pairwise cosine similarity) for each social group, where higher self-similarity denotes greater attention directed by CLIP to the shared characteristics (age, race, or gender) of the social group. As age increases, the self-similarity of representations of Female individuals increases at a higher rate than for Male individuals, with the disparity most pronounced at the "more than 70" age range. All ten of the most self-similar social groups are individuals under the age of 10 or over the age of 70, and six of the ten are Female individuals. Existing biases of self-similarity and markedness between Male and Female gender groups are further exacerbated when the groups compared are individuals who are White and Male and individuals who are Black and Female. Results indicate that CLIP reflects the biases of the language and society which produced its training data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源