论文标题
通过产生类似NQ的问题来提高问题回答
Improving Question Answering with Generation of NQ-like Questions
论文作者
论文摘要
问答(QA)系统需要大量注释的数据,这些数据昂贵且耗时。由于不同的格式和复杂性,将现有基准测试的数据集转换为具有挑战性。为了解决这些问题,我们提出了一种算法,以自然产生类似于自然问题的日常交流(NQ)数据集中的较短问题,该数据集是通过在数据集中的样式中利用Quizbowl(QB)数据集中的较长琐事问题(QB)数据集中的。这提供了一种自动化的方法,可以为我们的质量检查系统生成更多数据。为了确保质量以及数量的数据,我们使用神经分类器检测并删除了错误的问题。我们证明,在低资源设置中,使用生成的数据改善了NQ和QB数据的基线系统的质量检查性能。我们的算法可提高培训数据的可扩展性,同时维持质量检查系统的数据质量。
Question Answering (QA) systems require a large amount of annotated data which is costly and time-consuming to gather. Converting datasets of existing QA benchmarks are challenging due to different formats and complexities. To address these issues, we propose an algorithm to automatically generate shorter questions resembling day-to-day human communication in the Natural Questions (NQ) dataset from longer trivia questions in Quizbowl (QB) dataset by leveraging conversion in style among the datasets. This provides an automated way to generate more data for our QA systems. To ensure quality as well as quantity of data, we detect and remove ill-formed questions using a neural classifier. We demonstrate that in a low resource setting, using the generated data improves the QA performance over the baseline system on both NQ and QB data. Our algorithm improves the scalability of training data while maintaining quality of data for QA systems.