一击DOC摘要检测：文档超出文本中的供电搜索

论文标题

一击DOC摘要检测：文档超出文本中的供电搜索

One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

论文作者

Java, Abhinav, Deshmukh, Shripad, Aggarwal, Milan, Jandial, Surgan, Sarkar, Mausoom, Krishnamurthy, Balaji

论文摘要

在包括搜索在内的各种应用程序中，积极消耗数字文档的研究范围为研究范围。传统上，文档中的搜索是作为文本匹配的问题施放的，忽略了结构化文档，表格等中常见的丰富布局和视觉提示。为此，我们提出了一个大多数未探索的问题：“我们可以搜索一个文档的单个查询实例中存在的目标文档页面中存在的其他类似snippets吗？”我们建议单体将其作为单拍段检测任务解决。单体融合了片段和文档的视觉，文本和空间方式的上下文，以在目标文档中找到查询片段。我们进行了广泛的消融和实验，显示单体从一击对象检测（BHRL），模板匹配和文档理解（layoutlmv3）中优于几个基线。由于目前的任务缺乏相关数据，因此我们对单体进行了编程生成的数据训练，该数据具有许多视觉上相似的查询片段和来自两个数据集的目标文档对 - Flamingo表单和PublayNet。我们还进行人类研究以验证生成的数据。

Active consumption of digital documents has yielded scope for research in various applications, including search. Traditionally, searching within a document has been cast as a text matching problem ignoring the rich layout and visual cues commonly present in structured documents, forms, etc. To that end, we ask a mostly unexplored question: "Can we search for other similar snippets present in a target document page given a single query instance of a document snippet?". We propose MONOMER to solve this as a one-shot snippet detection task. MONOMER fuses context from visual, textual, and spatial modalities of snippets and documents to find query snippet in target documents. We conduct extensive ablations and experiments showing MONOMER outperforms several baselines from one-shot object detection (BHRL), template matching, and document understanding (LayoutLMv3). Due to the scarcity of relevant data for the task at hand, we train MONOMER on programmatically generated data having many visually similar query snippets and target document pairs from two datasets - Flamingo Forms and PubLayNet. We also do a human study to validate the generated data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题