论文标题

PAGENET:朝着端到端弱监督的页面级手写中文识别

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

论文作者

Peng, Dezhi, Jin, Lianwen, Liu, Yuliang, Luo, Canjie, Lai, Songxuan

论文摘要

数十年来,手写的中文文本识别(HCTR)一直是一个活跃的研究主题。但是,大多数以前的研究仅着眼于裁剪文本线图像的识别,而忽略了实际应用程序中的文本线检测引起的错误。尽管近年来已经提出了针对页面文本识别的一些方法,但它们要么仅限于简单的布局,要么需要非常详细的注释,包括昂贵的线条级别甚至角色级边界框。为此,我们建议Pagenet端到端弱监督的页面级HCTR。 Pagenet检测并识别角色并预测其之间的阅读顺序,在处理包括多方向和弯曲文本线(包括多方向和弯曲的文本线)时,这更强大和灵活。利用所提出的弱监督学习框架,Pagenet只需要对真实数据注释成绩单。但是,它仍然可以在字符和线级别上输出检测和识别结果,从而避免了标记字符和文本线条框架框的人工和成本。在五个数据集上进行的广泛实验证明了Pagenet优于现有的弱监督和完全监督的页面级方法。这些实验结果可能会引发进一步的研究,而不是基于连接主义时间分类或注意力的现有方法的领域。源代码可在https://github.com/shannanyinxiang/pagenet上找到。

Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the recognition of cropped text line images, ignoring the error caused by text line detection in real-world applications. Although some approaches aimed at page-level text recognition have been proposed in recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and recognizes characters and predicts the reading order between them, which is more robust and flexible when dealing with complex layouts including multi-directional and curved text lines. Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines. Extensive experiments conducted on five datasets demonstrate the superiority of PageNet over existing weakly supervised and fully supervised page-level methods. These experimental results may spark further research beyond the realms of existing methods based on connectionist temporal classification or attention. The source code is available at https://github.com/shannanyinxiang/PageNet.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源