pmode：基于原型掩码的对象维度估计

论文标题

pmode：基于原型掩码的对象维度估计

PMODE: Prototypical Mask based Object Dimension Estimation

论文作者

Khalid, Thariq, Hakami, Mohammed Yahya, Souissi, Riad

论文摘要

神经网络可以估计野外物体的维度吗？在本文中，我们提出了一种方法和深度学习体系结构，以估计使用单眼相机在视频中感兴趣的四边形对象的维度。所提出的技术不使用相机校准或手工制作的几何特征。但是，在训练过程中分割神经网络系数的帮助下，学到了特征。采用了带有RESNET50骨干的基于实例的基于分割的深度神经网络，从而赋予对象的原型掩码，因此提供了一个有趣的区域来回归其尺寸。实例分割网络经过训练，只能查看感兴趣的最近对象。回归是使用MLP头进行的，该MLP头仅查看边界框检测器头的掩模系数和原型分割蒙版。我们用三个不同的随机摄像头训练了该系统，可用于测试数据集的22％MAPE

Can a neural network estimate an object's dimension in the wild? In this paper, we propose a method and deep learning architecture to estimate the dimensions of a quadrilateral object of interest in videos using a monocular camera. The proposed technique does not use camera calibration or handcrafted geometric features; however, features are learned with the help of coefficients of a segmentation neural network during the training process. A real-time instance segmentation-based Deep Neural Network with a ResNet50 backbone is employed, giving the object's prototype mask and thus provides a region of interest to regress its dimensions. The instance segmentation network is trained to look at only the nearest object of interest. The regression is performed using an MLP head which looks only at the mask coefficients of the bounding box detector head and the prototype segmentation mask. We trained the system with three different random cameras achieving 22% MAPE for the test dataset for the dimension estimation

下载PDF全文

下载文献需遵守相关版权规定

论文标题