调查包含在面部识别培训数据中对单个面部识别的影响

论文标题

调查包含在面部识别培训数据中对单个面部识别的影响

Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

论文作者

Dulhanty, Chris, Wong, Alexander

论文摘要

现代面部识别系统利用包含数十万个特定个人面孔的图像来训练深层卷积神经网络的图像，以学习嵌入空间，该空间将任意个人的面孔映射到其身份的矢量表示。面部识别系统在面部验证（1：1）和面部识别（1：n）任务中的性能直接与嵌入空间区分身份区分的能力有关。最近，大规模面部识别培训数据集（如MS-CELEB-1M和Megaface）的来源和隐私含义进行了广泛的审查，因为许多人对他们的面孔被用来训练双重使用技术感到不舒服，从而可以启用大规模监视。但是，个人在培训数据中纳入衍生系统识别它们的能力的影响尚未得到研究。在这项工作中，我们在大规模的面部识别实验中审核Arcface是一种最先进的开源面部识别系统，并具有超过一百万个干扰器图像。我们发现该模型培训数据中存在的个体的排名1面识别精度为79.71％，而不存在的人的精度为75.73％。准确性的这种适度差异表明，使用深度学习的面部识别系统对他们接受培训的个人更好地工作，当人们认为所有主要的开源面部识别培训数据集在收集过程中未获得个人的知情同意，这对他们的隐私含义很大。

Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals' faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual's face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual's inclusion in training data on a derived system's ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题