'tis但是你的名字：语义问题回答1M实体的11m名称评估

论文标题

'tis但是你的名字：语义问题回答1M实体的11m名称评估

'Tis but Thy Name: Semantic Question Answering Evaluation with 11M Names for 1M Entities

论文作者

Huang, Albert

论文摘要

基于经典的词汇匹配的质量检查指标正在慢慢逐步淘汰，因为它们仅仅因为没有作为地面真理提供这些答案，因此它们惩罚了简洁或信息丰富的输出。最近提出的神经指标可以评估语义相似性，但接受了从外国领域接枝的小文本相似性数据集进行培训。我们介绍了Wiki实体相似性（WES）数据集，这是一个11m示例，域名，语义实体相似性数据集，该数据集是从Wikipedia中的链接文本生成的。 WES是针对质量检查评估量身定制的：示例是实体和短语，并将其分组为语义簇，以模拟多个地面真相标签。人类注释者一贯同意WES标签，而基本的交叉编码器指标比预测人类正确性判断的四个经典指标要好。

Classic lexical-matching-based QA metrics are slowly being phased out because they punish succinct or informative outputs just because those answers were not provided as ground truth. Recently proposed neural metrics can evaluate semantic similarity but were trained on small textual similarity datasets grafted from foreign domains. We introduce the Wiki Entity Similarity (WES) dataset, an 11M example, domain targeted, semantic entity similarity dataset that is generated from link texts in Wikipedia. WES is tailored to QA evaluation: the examples are entities and phrases and grouped into semantic clusters to simulate multiple ground-truth labels. Human annotators consistently agree with WES labels, and a basic cross encoder metric is better than four classic metrics at predicting human judgments of correctness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题