论文标题
物体识别鲁棒性的发展轨迹:孩子就像小成年人,但与大深神经网络不同
The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks
论文作者
论文摘要
在实验室对象识别任务中,基于未透露的照片,成年人和深度神经网络(DNNS)都执行接近天花板。与成年人的对象识别性能在广泛的图像扭曲方面具有鲁棒性不同,在标准成像网(1.30万图像)上训练的DNN在扭曲的图像上表现较差。然而,在过去的两年中,DNN失真鲁棒性的增长令人印象深刻,主要是通过越来越多的大规模数据集实现的$ \ unicode {x2014} $比Imagenet大的数量级。尽管这种简单的蛮力方法在DNN中实现人类水平的鲁棒性非常有效,但它提出了一个问题,即人的鲁棒性是否也仅仅是由于在儿童时期及以后的(扭曲)视觉输入的丰富经验。在这里,我们通过比较146名儿童(4 $ \ unicode {x2013} $ 15)与成人和与DNN的核心对象识别表现(年龄4 $ \ unicode {x2013} $ 15)进行调查。首先,我们发现已经4 $ \ unicode {x2013} $ 6岁的孩子表现出非常强大的图像扭曲和胜过Imagenet训练的DNNS。其次,我们估计了$ \ unicode {x201c} $ images $ \ unicode {x201d} $儿童在其一生中暴露的数量。与各种DNN相比,儿童的高鲁棒性需要相对较少的数据。第三,当识别对象时,孩子$ \ unicode {x2014} $像成人一样,但与dnns $ \ unicode不同,{x2014} $很大程度上取决于形状,但不取决于纹理提示。我们的结果一起表明,对扭曲的显着鲁棒性在人类物体识别的发展轨迹的早期出现,而不可能仅仅是由于视觉输入而造成的经验而产生的。即使目前的DNN与人类在鲁棒性方面的绩效相匹配,但他们似乎依靠不同的数据渴望策略来做到这一点。
In laboratory object recognition tasks based on undistorted photographs, both adult humans and Deep Neural Networks (DNNs) perform close to ceiling. Unlike adults', whose object recognition performance is robust against a wide range of image distortions, DNNs trained on standard ImageNet (1.3M images) perform poorly on distorted images. However, the last two years have seen impressive gains in DNN distortion robustness, predominantly achieved through ever-increasing large-scale datasets$\unicode{x2014}$orders of magnitude larger than ImageNet. While this simple brute-force approach is very effective in achieving human-level robustness in DNNs, it raises the question of whether human robustness, too, is simply due to extensive experience with (distorted) visual input during childhood and beyond. Here we investigate this question by comparing the core object recognition performance of 146 children (aged 4$\unicode{x2013}$15) against adults and against DNNs. We find, first, that already 4$\unicode{x2013}$6 year-olds showed remarkable robustness to image distortions and outperform DNNs trained on ImageNet. Second, we estimated the number of $\unicode{x201C}$images$\unicode{x201D}$ children have been exposed to during their lifetime. Compared to various DNNs, children's high robustness requires relatively little data. Third, when recognizing objects children$\unicode{x2014}$like adults but unlike DNNs$\unicode{x2014}$rely heavily on shape but not on texture cues. Together our results suggest that the remarkable robustness to distortions emerges early in the developmental trajectory of human object recognition and is unlikely the result of a mere accumulation of experience with distorted visual input. Even though current DNNs match human performance regarding robustness they seem to rely on different and more data-hungry strategies to do so.