论文标题
对视觉功能提取器进行系统评估的公平指标
Fairness Indicators for Systematic Assessments of Visual Feature Extractors
论文作者
论文摘要
每个人都同样从计算机视觉系统中受益?随着计算机视觉系统的大规模部署,对这个问题的答案变得越来越重要,当他们在各种人口统计和社会背景之间表现出巨大的绩效差异时,他们可能会引发主要问题。对公平性,危害和计算机视觉系统的偏见的系统诊断是迈向建立对社会负责的系统的重要一步。为了努力进行标准化的公平审核,我们提出了三个公平指标,旨在量化视觉系统的危害和偏见。我们的指标使用收集的现有公开数据集进行公平评估,并专注于文献中确定的三种主要危害和偏见类型,即有害标签关联,社会和人口统计学特质的学会表现形式的差异,以及在全球范围内的地理上多样化的图像上偏见的性能。我们定义了精确的实验协议,适用于计算机范围的模型。这些指标是不断发展的公平探针套件的一部分,并且无意代替对新计算机视觉技术的更广泛影响的彻底分析。但是,我们认为这是迈向(1)促进计算机视觉研究中公平评估的广泛采用和任务的必要第一步,以及(2)跟踪在建立社会负责的模型方面的进步。为了研究我们提出的指标对任何视觉系统的实际有效性和广泛的适用性,我们将其应用于使用广泛采用的模型训练范式构建的现成模型,这些模型的能力各不相同,它们是否可以预测给定图像上的标签或仅产生嵌入。我们还系统地研究了数据域和模型大小的效果。
Does everyone equally benefit from computer vision systems? Answers to this question become more and more important as computer vision systems are deployed at large scale, and can spark major concerns when they exhibit vast performance discrepancies between people from various demographic and social backgrounds. Systematic diagnosis of fairness, harms, and biases of computer vision systems is an important step towards building socially responsible systems. To initiate an effort towards standardized fairness audits, we propose three fairness indicators, which aim at quantifying harms and biases of visual systems. Our indicators use existing publicly available datasets collected for fairness evaluations, and focus on three main types of harms and bias identified in the literature, namely harmful label associations, disparity in learned representations of social and demographic traits, and biased performance on geographically diverse images from across the world.We define precise experimental protocols applicable to a wide range of computer vision models. These indicators are part of an ever-evolving suite of fairness probes and are not intended to be a substitute for a thorough analysis of the broader impact of the new computer vision technologies. Yet, we believe it is a necessary first step towards (1) facilitating the widespread adoption and mandate of the fairness assessments in computer vision research, and (2) tracking progress towards building socially responsible models. To study the practical effectiveness and broad applicability of our proposed indicators to any visual system, we apply them to off-the-shelf models built using widely adopted model training paradigms which vary in their ability to whether they can predict labels on a given image or only produce the embeddings. We also systematically study the effect of data domain and model size.