DOC-GCN：用于文档布局分析的异构图卷积网络

论文标题

DOC-GCN：用于文档布局分析的异构图卷积网络

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

论文作者

Luo, Siwen, Ding, Yihao, Long, Siqu, Poon, Josiah, Han, Soyeon Caren

论文摘要

当将文档解析到用于下游应用程序的结构化的机器可读格式中时，识别非结构化数字文档的布局至关重要。文档布局分析中的最新研究通常依靠计算机视觉模型来理解文档，同时忽略其他信息，例如上下文信息或文档组件的关系，这对于捕获至关重要。我们的DOC-GCN提出了一种有效的方式，可以协调和整合异质方面以进行文档布局分析。我们首先构造图形以明确描述四个主要方面，包括句法，语义，密度和外观/视觉信息。然后，我们应用图形卷积网络来表示信息的每个方面，并使用池进行集成。最后，我们将各个方面汇总，并将它们送入2层MLP，以进行文档布局组件分类。我们的DOC-GCN实现了新的最先进的结果，从而获得了三个广泛使用的DLA数据集。

Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题