论文标题
野外的表检测:一种新型的不同表检测数据集和方法
Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method
论文作者
论文摘要
表检测中最近的深度学习方法取得了出色的性能,并证明可以有效地识别文档布局。当前,可用的表检测基准有许多局限性,包括缺乏样品多样性,简单的表结构,缺乏训练案例和样品质量。在本文中,我们介绍了一个多样化的大型数据集,用于桌子检测,其中有7000多个样品包含从许多不同来源收集的各种桌子结构。除此之外,我们还使用基于卷积神经网络的方法提出基线结果,以检测文档中的表结构。实验结果表明,应用卷积深度学习方法比基于计算机视觉的方法的优越性。此不同表检测数据集的引入将使社区能够开发出高吞吐量深度学习方法,以了解文档布局和表格数据处理。数据集可在以下网址找到:1。https://www.kaggle.com/datasets/mrinalim/stdw-dataset 2。https://huggingface.co/datasetsets/n3011/stdw
Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of samples diversity, simple table structure, the lack of training cases, and samples quality. In this paper, we introduce a diverse large-scale dataset for table detection with more than seven thousand samples containing a wide variety of table structures collected from many diverse sources. In addition to that, we also present baseline results using a convolutional neural network-based method to detect table structure in documents. Experimental results show the superiority of applying convolutional deep learning methods over classical computer vision-based methods. The introduction of this diverse table detection dataset will enable the community to develop high throughput deep learning methods for understanding document layout and tabular data processing. Dataset is available at: 1. https://www.kaggle.com/datasets/mrinalim/stdw-dataset 2. https://huggingface.co/datasets/n3011/STDW