IIIT-AR-13K：文档中用于图形对象检测的新数据集

论文标题

IIIT-AR-13K：文档中用于图形对象检测的新数据集

IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

论文作者

Mondal, Ajoy, Lipps, Peter, Jawahar, C. V.

论文摘要

我们在业务文档中引入了一个用于图形对象检测的新数据集，更具体地说是年度报告。该数据集是IIIT-AR-13K，是通过在公开可用的年度报告中手动注释图形或页面对象的边界框而创建的。该数据集总共包含13K注释的页面图像，其中包含五个不同类别的对象 - 表，图，自然图像，徽标和签名。它是用于图形对象检测的最大手动注释数据集。用多种语言创建了几年的各种公司的年度报告将高度多样性带入了该数据集中。我们使用更快的R-CNN [20]和Mask r-CNN [11]基准使用两个最先进的对象检测技术基准使用两个先进的图形对象检测技术，并建立高基础来进行进一步研究。我们的数据集非常有效，作为用于在业务文档和技术文章中开发用于图形对象检测的实用解决方案的培训数据。通过对IIIT-AR-13K进行培训，我们证明了单个解决方案的可行性，该解决方案与接受大量数据的训练相比，可以报告出色的性能，以进行表检测。我们希望我们的数据集有助于推进研究以检测业务文档中各种图形对象的研究。

We introduce a new dataset for graphical object detection in business documents, more specifically annual reports. This dataset, IIIT-AR-13k, is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection. Annual reports created in multiple languages for several years from various companies bring high diversity into this dataset. We benchmark IIIT-AR-13K dataset with two state of the art graphical object detection techniques using Faster R-CNN [20] and Mask R-CNN [11] and establish high baselines for further research. Our dataset is highly effective as training data for developing practical solutions for graphical object detection in both business documents and technical articles. By training with IIIT-AR-13K, we demonstrate the feasibility of a single solution that can report superior performance compared to the equivalent ones trained with a much larger amount of data, for table detection. We hope that our dataset helps in advancing the research for detecting various types of graphical objects in business documents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题