密集标签编码的边界不连续性旋转检测

论文标题

密集标签编码的边界不连续性旋转检测

Dense Label Encoding for Boundary Discontinuity Free Rotation Detection

论文作者

Yang, Xue, Hou, Liping, Zhou, Yue, Wang, Wentao, Yan, Junchi

论文摘要

在许多视觉应用中，旋转检测是涉及空中图像，场景文本和面部等的许多视觉应用中的基本构建块。与基于回归的主要回归方法进行方向估计不同，本文探讨了基于分类的相对较少研究的方法。希望固有地消除基于回归的检测器所遇到的边界不连续问题。 We propose new techniques to push its frontier in two aspects: i) new encoding mechanism: the design of two Densely Coded Labels (DCL) for angle classification, to replace the Sparsely Coded Label (SCL) in existing classification-based detectors, leading to three times training speed increase as empirically observed across benchmarks, further with notable improvement in detection accuracy; ii）损失重新加权：我们提出角度距离和宽高比敏感的加权（ADARSW），它通过使基于DCL的检测器对角距离和对象的长宽比敏感，从而提高了检测准确性，尤其是对于平方样物体。大规模公共数据集的广泛实验和视觉分析，即DOTA，UCAS-AOD，HRSC2016，以及场景文本数据集ICDAR2015和MLT，显示了我们方法的有效性。源代码可从https://github.com/thinklab-sjtu/dcl_retinanet_tensorflow获得，也集成在我们的开源旋转检测基准中：https：//github.com/yangxue0827/rotationDetection。

Rotation detection serves as a fundamental building block in many visual applications involving aerial image, scene text, and face etc. Differing from the dominant regression-based approaches for orientation estimation, this paper explores a relatively less-studied methodology based on classification. The hope is to inherently dismiss the boundary discontinuity issue as encountered by the regression-based detectors. We propose new techniques to push its frontier in two aspects: i) new encoding mechanism: the design of two Densely Coded Labels (DCL) for angle classification, to replace the Sparsely Coded Label (SCL) in existing classification-based detectors, leading to three times training speed increase as empirically observed across benchmarks, further with notable improvement in detection accuracy; ii) loss re-weighting: we propose Angle Distance and Aspect Ratio Sensitive Weighting (ADARSW), which improves the detection accuracy especially for square-like objects, by making DCL-based detectors sensitive to angular distance and object's aspect ratio. Extensive experiments and visual analysis on large-scale public datasets for aerial images i.e. DOTA, UCAS-AOD, HRSC2016, as well as scene text dataset ICDAR2015 and MLT, show the effectiveness of our approach. The source code is available at https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow and is also integrated in our open source rotation detection benchmark: https://github.com/yangxue0827/RotationDetection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题