论文标题
朝着强大的数值问题回答:诊断NLP系统的数值功能
Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems
论文作者
论文摘要
数值问题回答是回答需要数值功能的问题的任务。以前的作品将一般对抗性攻击引入数字问题的回答,而不是系统地探索特定于主题的数值功能。在本文中,我们建议对一系列数值答案系统和数据集进行数值能力诊断。突出显示了一系列数值功能,并设计了相应的数据集扰动。经验结果表明,现有系统受到这些扰动的严重挑战。例如,Graph2Tree在ASDIV-A上的``额外''扰动中经历了53.83%的绝对精度下降,而BART在数值下降子集上的``语言''扰动经历了13.80%的精度下降。作为一种抵消方法,我们还调查了将扰动作为数据增强以减轻系统缺乏强大数值能力的有效性。通过实验分析和经验研究,证明具有鲁棒数值能力回答的数值问题在很大程度上仍然是一个开放的问题。我们讨论数值问题回答的未来方向,并总结有关未来数据集收集和系统设计的指南。
Numerical Question Answering is the task of answering questions that require numerical capabilities. Previous works introduce general adversarial attacks to Numerical Question Answering, while not systematically exploring numerical capabilities specific to the topic. In this paper, we propose to conduct numerical capability diagnosis on a series of Numerical Question Answering systems and datasets. A series of numerical capabilities are highlighted, and corresponding dataset perturbations are designed. Empirical results indicate that existing systems are severely challenged by these perturbations. E.g., Graph2Tree experienced a 53.83% absolute accuracy drop against the ``Extra'' perturbation on ASDiv-a, and BART experienced 13.80% accuracy drop against the ``Language'' perturbation on the numerical subset of DROP. As a counteracting approach, we also investigate the effectiveness of applying perturbations as data augmentation to relieve systems' lack of robust numerical capabilities. With experiment analysis and empirical studies, it is demonstrated that Numerical Question Answering with robust numerical capabilities is still to a large extent an open question. We discuss future directions of Numerical Question Answering and summarize guidelines on future dataset collection and system design.