彗星原子2020：关于符号和神经常识知识图

论文标题

彗星原子2020：关于符号和神经常识知识图

COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

论文作者

Hwang, Jena D., Bhagavatula, Chandra, Bras, Ronan Le, Da, Jeff, Sakaguchi, Keisuke, Bosselut, Antoine, Choi, Yejin

论文摘要

近年来，人们对自然语言理解领域的常识性代表和推理产生了重新兴趣。新的常识知识图（CSKG）的开发对于这些进步至关重要，因为可以通过机器学习模型使用和引用其各种事实来解决新的和具有挑战性的任务。同时，由于全面包含一般常识性知识所需的巨大规模，有关这些资源的质量和覆盖范围仍然存在疑问。在这项工作中，我们认为，手动构造的CSKG将永远无法实现在NLP代理商遇到的所有情况下都适用的覆盖范围。因此，我们提出了一个新的评估框架，以根据如何从中学习有效隐式知识表示来测试KGS的实用性。有了这个新的目标，我们提出了2020年原子，这是一种新的通用常识性知识，其中包含知识的知识，这些知识在验证的语言模型中不容易获得。我们与其他领先的CSKG相比，我们评估了其特性，进行了第一个大规模的成对研究，对常识性知识资源进行了研究。接下来，我们表明Atomic 2020更适合培训知识模型，这些模型可以为新的，看不见的实体和事件产生准确的代表性知识。最后，通过人类评估，我们表明GPT-3（175b参数）的少量性能虽然令人印象深刻，但仍比基于BART的知识模型低12个绝对点，尽管在原子2020上使用了430倍以上的参数，但仍低于Atomic 2020。

Recent years have brought about a renewed interest in commonsense representation and reasoning in the field of natural language understanding. The development of new commonsense knowledge graphs (CSKG) has been central to these advances as their diverse facts can be used and referenced by machine learning models for tackling new and challenging tasks. At the same time, there remain questions about the quality and coverage of these resources due to the massive scale required to comprehensively encompass general commonsense knowledge. In this work, we posit that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents. Therefore, we propose a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them. With this new goal, we propose ATOMIC 2020, a new CSKG of general-purpose commonsense knowledge containing knowledge that is not readily available in pretrained language models. We evaluate its properties in comparison with other leading CSKGs, performing the first large-scale pairwise study of commonsense knowledge resources. Next, we show that ATOMIC 2020 is better suited for training knowledge models that can generate accurate, representative knowledge for new, unseen entities and events. Finally, through human evaluation, we show that the few-shot performance of GPT-3 (175B parameters), while impressive, remains ~12 absolute points lower than a BART-based knowledge model trained on ATOMIC 2020 despite using over 430x fewer parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题