生物学和神经模型中的超快图像分类

论文标题

生物学和神经模型中的超快图像分类

Ultrafast Image Categorization in Biology and Neural Models

论文作者

Jérémie, Jean-Nicolas, Perrinet, Laurent U

论文摘要

人类能够非常有效地对图像进行分类，尤其是很快地检测动物的存在。最近，对于广泛的视觉分类任务，基于卷积神经网络（CNN）基于卷积神经网络（CNN）（CNN）的深度学习算法高于人类的准确性。但是，通常对这些人工网络进行训练和评估的任务往往是高度专业化的，并且不能很好地概括，例如，图像旋转后的准确性下降。在这方面，与人工系统相比，用于识别动物等更一般任务的人工系统比人工系统更灵活和高效。为了进一步比较生物学和人工神经网络，我们对两个与人类生态相关的独立任务进行了标准VGG 16 CNN的培训：检测动物或人工制品的存在。我们表明，重新培训网络可以达到类似人类的表现，与心理物理任务中报道的表现相当。此外，我们表明当组合模型的输出时，分类更好。实际上，在包含人工制品（例如建筑物）的照片中，动物（例如狮子）往往不太存在。此外，这些重新训练的模型能够从人类心理物理学中复制一些意外的行为观察，例如对旋转的稳健性（例如，颠倒或倾斜的图像）或灰度转换。最后，我们量化了实现此类性能所需的CNN层数，并表明只有几层可以实现超快图像分类的良好精度，这挑战了对图像识别需要对视觉对象进行深入的顺序分析的信念。

Humans are able to categorize images very efficiently, in particular to detect the presence of an animal very quickly. Recently, deep learning algorithms based on convolutional neural networks (CNNs) have achieved higher than human accuracy for a wide range of visual categorization tasks. However, the tasks on which these artificial networks are typically trained and evaluated tend to be highly specialized and do not generalize well, e.g., accuracy drops after image rotation. In this respect, biological visual systems are more flexible and efficient than artificial systems for more general tasks, such as recognizing an animal. To further the comparison between biological and artificial neural networks, we re-trained the standard VGG 16 CNN on two independent tasks that are ecologically relevant to humans: detecting the presence of an animal or an artifact. We show that re-training the network achieves a human-like level of performance, comparable to that reported in psychophysical tasks. In addition, we show that the categorization is better when the outputs of the models are combined. Indeed, animals (e.g., lions) tend to be less present in photographs that contain artifacts (e.g., buildings). Furthermore, these re-trained models were able to reproduce some unexpected behavioral observations from human psychophysics, such as robustness to rotation (e.g., an upside-down or tilted image) or to a grayscale transformation. Finally, we quantified the number of CNN layers required to achieve such performance and showed that good accuracy for ultrafast image categorization can be achieved with only a few layers, challenging the belief that image recognition requires deep sequential analysis of visual objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题