论文标题

EDTER:变压器的边缘检测

EDTER: Edge Detection with Transformer

论文作者

Pu, Mengyang, Huang, Yaping, Liu, Yuming, Guan, Qingji, Ling, Haibin

论文摘要

卷积神经网络通过逐步探索上下文和语义特征在边缘检测中取得了重大进展。但是,随着接收场的扩大,局部细节逐渐被抑制。最近,Vision Transformer在捕获长期依赖性方面表现出了出色的能力。受此启发,我们提出了一个新型的基于变压器的边缘检测器\ emph {边缘检测变压器(Edter)},以通过利用完整的图像上下文信息和详细的本地提示来提取清晰且清晰的对象边界和有意义的边缘。 Edter在两个阶段工作。在第I阶段,全局变压器编码器用于捕获粗粒图像补丁上的远程全局上下文。然后,在第二阶段,本地变压器编码器可以在细粒度的贴片上进行挖掘,以挖掘短程的本地提示。每个变压器编码器之后都是精心设计的双向多级聚合解码器,以实现高分辨率特征。最后,全球环境和本地提示是由功能融合模块组合在一起的,并将其送入边缘预测的决策主管。 BSDS500,NYUDV2和Multicue的广泛实验证明了与最先进的Edter相比。

Convolutional neural networks have made significant progresses in edge detection by progressively exploring the context and semantic features. However, local details are gradually suppressed with the enlarging of receptive fields. Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we propose a novel transformer-based edge detector, \emph{Edge Detection TransformER (EDTER)}, to extract clear and crisp object boundaries and meaningful edges by exploiting the full image context information and detailed local cues simultaneously. EDTER works in two stages. In Stage I, a global transformer encoder is used to capture long-range global context on coarse-grained image patches. Then in Stage II, a local transformer encoder works on fine-grained patches to excavate the short-range local cues. Each transformer encoder is followed by an elaborately designed Bi-directional Multi-Level Aggregation decoder to achieve high-resolution features. Finally, the global context and local cues are combined by a Feature Fusion Module and fed into a decision head for edge prediction. Extensive experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源