基于错误回答的事实框架错误本地化的缺点

论文标题

基于错误回答的事实框架错误本地化的缺点

Shortcomings of Question Answering Based Factuality Frameworks for Error Localization

论文作者

Kamoi, Ryo, Goyal, Tanya, Durrett, Greg

论文摘要

尽管抽象性摘要最近取得了进展，但模型通常会产生以事实错误的摘要。已经提出了许多检测这些错误的方法，其中最受欢迎的是问题回答（QA）的事实指标。这些已被证明在预测摘要级别的事实方面非常有效，并且有可能将错误定位在摘要中，但是在过去的研究中，后一种能力尚未系统地评估。在本文中，我们进行了第一个这样的分析，发现与我们的期望相反，基于质量检查的框架无法正确识别生成的摘要中的误差跨度，并且比琐碎的精确匹配基线表现出色。我们的分析揭示了如此糟糕的本地化的主要原因：QG模块产生的问题通常从非事实摘要中继承错误，然后将其进一步传播到下游模块中。此外，即使是人类的问题产生也无法轻易抵消这些问题。我们的实验最终表明，使用QA框架存在本地化的基本问题，QA框架不能仅通过更强的QA和QG模型来固定。

Despite recent progress in abstractive summarization, models often generate summaries with factual errors. Numerous approaches to detect these errors have been proposed, the most popular of which are question answering (QA)-based factuality metrics. These have been shown to work well at predicting summary-level factuality and have potential to localize errors within summaries, but this latter capability has not been systematically evaluated in past research. In this paper, we conduct the first such analysis and find that, contrary to our expectations, QA-based frameworks fail to correctly identify error spans in generated summaries and are outperformed by trivial exact match baselines. Our analysis reveals a major reason for such poor localization: questions generated by the QG module often inherit errors from non-factual summaries which are then propagated further into downstream modules. Moreover, even human-in-the-loop question generation cannot easily offset these problems. Our experiments conclusively show that there exist fundamental issues with localization using the QA framework which cannot be fixed solely by stronger QA and QG models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题