序列到序列的对比度学习文本识别

论文标题

序列到序列的对比度学习文本识别

Sequence-to-Sequence Contrastive Learning for Text Recognition

论文作者

Aberdam, Aviad, Litman, Ron, Tsiper, Shahar, Anschel, Oron, Slossberg, Ron, Mazor, Shai, Manmatha, R., Perona, Pietro

论文摘要

我们为视觉表示的顺序对比度学习（SEQCLR）提出了一个框架，我们将其应用于文本识别。为了说明序列到序列结构，每个特征映射都分为不同的实例，以计算对比度损失。此操作使我们能够在子字级别上进行对比，在每个图像中，我们提取几对正面和多个负面示例。为了产生有效的文本识别视觉表示形式，我们进一步建议新颖的增强启发式方法，不同的编码器体系结构和自定义投影头。在手写文本和场景文本上进行的实验表明，当对学习表示的文本解码器进行培训时，我们的方法的表现优于非序列对比方法。此外，当减少监督量时，与监督培训相比，SEQCLR显着提高了性能，当用100％的标签进行微调时，我们的方法可在标准手写文本识别基准上实现最先进的结果。

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题