来自异质文档图像的稳健表检测和结构识别

论文标题

来自异质文档图像的稳健表检测和结构识别

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

论文作者

Ma, Chixiang, Lin, Weihong, Sun, Lei, Huo, Qiang

论文摘要

我们介绍了一种名为RobustAbnet的新表检测和结构识别方法，以检测表的边界并从异质文档图像中重建每个表的细胞结构。为了进行表检测，我们建议将Cornernet用作新的区域建议网络来生成更高质量的表建议，以更快的R-CNN，这显着提高了更快的R-CNN的定位准确性以进行表检测。因此，我们的表检测方法仅使用轻巧的resnet-18骨干网络，在三个公共表检测基准（即CTDAR TRACKA，PUBLAYNET和IIIT-AR-13K）上实现了最先进的性能。此外，我们提出了一种新的基于分裂和合并的表结构识别方法，其中提出了一种新型的空间CNN分离线预测模块将每个检测到的表分为细胞的网格，并将基于网格CNN的细胞合并模块应用于恢复分布细胞。由于空间CNN模块可以有效地传播整个表图像的上下文信息，因此我们的表结构识别器可以坚固地识别具有较大空白空间和几何扭曲（甚至弯曲的）表的表。多亏了这两种技术，我们的表结构识别方法可以在包括SCITSR，PubTabnet和CTDAR TrackB2-Modern在内的三个公共基准上实现最先进的性能。此外，我们进一步证明了我们方法在识别具有复杂结构，较大空白的表以及在更具挑战性的内部数据集中的几何扭曲甚至弯曲形状方面的优势。

We introduce a new table detection and structure recognition approach named RobusTabNet to detect the boundaries of tables and reconstruct the cellular structure of each table from heterogeneous document images. For table detection, we propose to use CornerNet as a new region proposal network to generate higher quality table proposals for Faster R-CNN, which has significantly improved the localization accuracy of Faster R-CNN for table detection. Consequently, our table detection approach achieves state-of-the-art performance on three public table detection benchmarks, namely cTDaR TrackA, PubLayNet and IIIT-AR-13K, by only using a lightweight ResNet-18 backbone network. Furthermore, we propose a new split-and-merge based table structure recognition approach, in which a novel spatial CNN based separation line prediction module is proposed to split each detected table into a grid of cells, and a Grid CNN based cell merging module is applied to recover the spanning cells. As the spatial CNN module can effectively propagate contextual information across the whole table image, our table structure recognizer can robustly recognize tables with large blank spaces and geometrically distorted (even curved) tables. Thanks to these two techniques, our table structure recognition approach achieves state-of-the-art performance on three public benchmarks, including SciTSR, PubTabNet and cTDaR TrackB2-Modern. Moreover, we have further demonstrated the advantages of our approach in recognizing tables with complex structures, large blank spaces, as well as geometrically distorted or even curved shapes on a more challenging in-house dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题