论文标题

通过域名的合成问题产生零拍神经通道检索

Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

论文作者

Ma, Ji, Korotkov, Ivan, Yang, Yinfei, Hall, Keith, McDonald, Ryan

论文摘要

广泛采用神经检索模型的一个主要障碍是,他们需要大量的监督培训集来超越传统的基于术语的技术,这些技术是由RAW Corpora构建的。在本文中,我们提出了一种用于通过综合问题生成来缩小这一差距的通道检索的零拍学习方法。问题生成系统经过一般域数据的培训,但应用于目标域中的文档。这使我们能够创建特定于域的任意大型但嘈杂的问题相关性对。此外,当它与简单的混合术语神经模型相结合时,就可以进一步提高第一阶段的检索性能。从经验上讲,我们表明这是在没有大型培训语料库的情况下建立神经通道检索模型的有效策略。根据域的不同,该技术甚至可以接近监督模型的准确性。

A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, yet noisy, question-passage relevance pairs that are domain specific. Furthermore, when this is coupled with a simple hybrid term-neural model, first-stage retrieval performance can be improved further. Empirically, we show that this is an effective strategy for building neural passage retrieval models in the absence of large training corpora. Depending on the domain, this technique can even approach the accuracy of supervised models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源