DUREADER_ROBUST：一个中文数据集，用于评估现实世界应用程序中机器阅读理解的鲁棒性和概括

论文标题

DUREADER_ROBUST：一个中文数据集，用于评估现实世界应用程序中机器阅读理解的鲁棒性和概括

DuReader_robust: A Chinese Dataset Towards Evaluating Robustness and Generalization of Machine Reading Comprehension in Real-World Applications

论文作者

Tang, Hongxuan, Li, Hongyu, Liu, Jing, Hong, Yu, Wu, Hua, Wang, Haifeng

论文摘要

机器阅读理解（MRC）是自然语言处理的至关重要任务，并且取得了显着的进步。但是，大多数神经MRC模型仍然远非强大，并且无法在现实世界应用中概述。为了全面验证MRC模型的鲁棒性和概括，我们引入了一个现实世界中的中文数据集-Dureader_robust。它旨在从三个方面评估MRC模型：过度敏感性，过度稳定性和泛化。与以前的工作相比，dureader_robust中的实例是自然文本，而不是改变的不自然文本。它在将MRC模型应用于现实世界应用程序时提出了挑战。实验结果表明，MRC模型在挑战测试集中表现不佳。此外，我们分析了现有模型在挑战测试集上的行为，这可能为未来的模型开发提供建议。数据集和代码可在https://github.com/baidu/dureader上公开获得。

Machine reading comprehension (MRC) is a crucial task in natural language processing and has achieved remarkable advancements. However, most of the neural MRC models are still far from robust and fail to generalize well in real-world applications. In order to comprehensively verify the robustness and generalization of MRC models, we introduce a real-world Chinese dataset -- DuReader_robust. It is designed to evaluate the MRC models from three aspects: over-sensitivity, over-stability and generalization. Comparing to previous work, the instances in DuReader_robust are natural texts, rather than the altered unnatural texts. It presents the challenges when applying MRC models to real-world applications. The experimental results show that MRC models do not perform well on the challenge test set. Moreover, we analyze the behavior of existing models on the challenge test set, which may provide suggestions for future model development. The dataset and codes are publicly available at https://github.com/baidu/DuReader.

下载PDF全文

下载文献需遵守相关版权规定

论文标题