文本diae：一种自我监管的退化不变自动编码器，用于文本识别和文档增强

论文标题

文本diae：一种自我监管的退化不变自动编码器，用于文本识别和文档增强

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

论文作者

Souibgui, Mohamed Ali, Biswas, Sanket, Mafla, Andres, Biten, Ali Furkan, Fornés, Alicia, Kessentini, Yousri, Lladós, Josep, Gomez, Lluis, Karatzas, Dimosthenis

论文摘要

在本文中，我们提出了一个文本降低不变的自动编码器（Text-diae），这是一种旨在解决两个任务，文本识别（手写或场景文本）和文档图像增强功能的自我监督模型。我们首先采用基于变压器的体系结构，该体系结构将三个借口任务纳入了学习目标，即在预训练期间优化，而无需使用标记的数据。每个借口目标都是专门针对最终下游任务量身定制的。我们进行了几项消融实验，以确认所选借口任务的设计选择。重要的是，所提出的模型不会基于对比损失表现出先前最新方法的局限性，而同时需要更少的数据样本来收敛。最后，我们证明我们的方法超过了手写和场景文本识别和文档图像增强的现有监督和自我监督的设置中的最新设置。我们的代码和训练有素的模型将在〜\ url {http：// on_accepters}上公开可用。

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labeled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at~\url{ http://Upon_Acceptance}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题