论文标题
密集标签编码的边界不连续性旋转检测
Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
论文作者
论文摘要
在许多视觉应用中,旋转检测是涉及空中图像,场景文本和面部等的许多视觉应用中的基本构建块。与基于回归的主要回归方法进行方向估计不同,本文探讨了基于分类的相对较少研究的方法。希望固有地消除基于回归的检测器所遇到的边界不连续问题。 We propose new techniques to push its frontier in two aspects: i) new encoding mechanism: the design of two Densely Coded Labels (DCL) for angle classification, to replace the Sparsely Coded Label (SCL) in existing classification-based detectors, leading to three times training speed increase as empirically observed across benchmarks, further with notable improvement in detection accuracy; ii)损失重新加权:我们提出角度距离和宽高比敏感的加权(ADARSW),它通过使基于DCL的检测器对角距离和对象的长宽比敏感,从而提高了检测准确性,尤其是对于平方样物体。大规模公共数据集的广泛实验和视觉分析,即DOTA,UCAS-AOD,HRSC2016,以及场景文本数据集ICDAR2015和MLT,显示了我们方法的有效性。源代码可从https://github.com/thinklab-sjtu/dcl_retinanet_tensorflow获得,也集成在我们的开源旋转检测基准中:https://github.com/yangxue0827/rotationDetection。
Rotation detection serves as a fundamental building block in many visual applications involving aerial image, scene text, and face etc. Differing from the dominant regression-based approaches for orientation estimation, this paper explores a relatively less-studied methodology based on classification. The hope is to inherently dismiss the boundary discontinuity issue as encountered by the regression-based detectors. We propose new techniques to push its frontier in two aspects: i) new encoding mechanism: the design of two Densely Coded Labels (DCL) for angle classification, to replace the Sparsely Coded Label (SCL) in existing classification-based detectors, leading to three times training speed increase as empirically observed across benchmarks, further with notable improvement in detection accuracy; ii) loss re-weighting: we propose Angle Distance and Aspect Ratio Sensitive Weighting (ADARSW), which improves the detection accuracy especially for square-like objects, by making DCL-based detectors sensitive to angular distance and object's aspect ratio. Extensive experiments and visual analysis on large-scale public datasets for aerial images i.e. DOTA, UCAS-AOD, HRSC2016, as well as scene text dataset ICDAR2015 and MLT, show the effectiveness of our approach. The source code is available at https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow and is also integrated in our open source rotation detection benchmark: https://github.com/yangxue0827/RotationDetection.