论文标题

长尾视觉识别摄像机陷阱图像中的动物物种的技巧袋

Bag of Tricks for Long-Tail Visual Recognition of Animal Species in Camera-Trap Images

论文作者

Cunha, Fagner, Santos, Eulanda M. dos, Colonna, Juan G.

论文摘要

相机陷阱是一种监视野生动植物的方法,它们收集了大量图片。每个物种收集的图像的数量通常遵循长尾分布,即,一些类别有大量实例,而许多物种的比例只有很小的比例。尽管在大多数情况下,这些稀有物种是生态学家感兴趣的物种,但在使用深度学习模型时,它们通常会被忽略,因为这些模型需要大量的培训图像。在这项工作中,提出了一个简单有效的框架,称为Square-Root采样分支(SSB),该框架结合了两个分类分支,使用方形 - 根本采样和实例采样培训,以改善长尾视觉识别,并且将其与用于处理此任务的最新方法相比:Squart-Root采样,类别采样,类别型型focal focal soft和Ballanced Soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft soft和bal soft softmax。为了得出更一般的结论,在四个计算机视觉模型(Resnet,MobilenetV3,EdgitionNetV2和Swin Transformer)和四个具有不同特征不同的摄像机陷阱数据集中,系统地评估了处理长尾视觉识别的方法。最初,准备了最新的训练技巧的坚固基线,然后采用了改善长尾识别的方法。我们的实验表明,平方根采样是最大程度地提高少数民族表现的方法。但是,这是以将多数类的准确性降低至少3%的代价。我们提出的框架(SSB)证明自己与其他方法具有竞争力,并且在大多数尾巴类别的情况下都取得了最佳或第二好的结果。但是,与平方根的采样不同,头部阶级表现的损失很小,因此在所有评估的方法中取得了最佳的权衡。

Camera traps are a method for monitoring wildlife and they collect a large number of pictures. The number of images collected of each species usually follows a long-tail distribution, i.e., a few classes have a large number of instances, while a lot of species have just a small percentage. Although in most cases these rare species are the ones of interest to ecologists, they are often neglected when using deep-learning models because these models require a large number of images for the training. In this work, a simple and effective framework called Square-Root Sampling Branch (SSB) is proposed, which combines two classification branches that are trained using square-root sampling and instance sampling to improve long-tail visual recognition, and this is compared to state-of-the-art methods for handling this task: square-root sampling, class-balanced focal loss, and balanced group softmax. To achieve a more general conclusion, the methods for handling long-tail visual recognition were systematically evaluated in four families of computer vision models (ResNet, MobileNetV3, EfficientNetV2, and Swin Transformer) and four camera-trap datasets with different characteristics. Initially, a robust baseline with the most recent training tricks was prepared and, then, the methods for improving long-tail recognition were applied. Our experiments show that square-root sampling was the method that most improved the performance for minority classes by around 15%; however, this was at the cost of reducing the majority classes' accuracy by at least 3%. Our proposed framework (SSB) demonstrated itself to be competitive with the other methods and achieved the best or the second-best results for most of the cases for the tail classes; but, unlike the square-root sampling, the loss in the performance of the head classes was minimal, thus achieving the best trade-off among all the evaluated methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源