论文标题
CLSE:语言意义的实体语料库
CLSE: Corpus of Linguistically Significant Entities
论文作者
论文摘要
自然语言产生(NLG)的最大挑战之一是对命名实体的适当处理。命名实体是语法错误的常见来源,例如错误的介词,错误的文章处理或不正确的实体变形。在不考虑语言表示的情况下,在评估一小部分任意选择的参数值时,或者将数据集从语言上简单的语言(如英语)转换为语言上复杂的语言时,这些错误通常会被占代表性不足。但是,对于某些应用,广泛精确的语法正确性至关重要 - 母语者可能会发现与实体相关的语法错误愚蠢,刺耳甚至令人反感。 为了启用更语言上不同的NLG数据集的创建,我们发布了语言专家注释的语言意义实体(CLSE)的语料库。该语料库包括34种语言,涵盖了74种不同的语义类型,以支持从航空票务到视频游戏的各种应用程序。为了演示CLSE的一种可能用途,我们生成了架构引导的对话框数据集SGD-CLSE的增强版本。使用CLSE的实体和少量的人类翻译,我们用三种语言创建了语言代表性的NLG评估基准:法语(高资源),Marathi(低资源)和俄罗斯(语言高度易转)。我们为神经,基于模板和混合NLG系统建立了质量基准,并讨论每种方法的优势和缺点。
One of the biggest challenges of natural language generation (NLG) is the proper handling of named entities. Named entities are a common source of grammar mistakes such as wrong prepositions, wrong article handling, or incorrect entity inflection. Without factoring linguistic representation, such errors are often underrepresented when evaluating on a small set of arbitrarily picked argument values, or when translating a dataset from a linguistically simpler language, like English, to a linguistically complex language, like Russian. However, for some applications, broadly precise grammatical correctness is critical -- native speakers may find entity-related grammar errors silly, jarring, or even offensive. To enable the creation of more linguistically diverse NLG datasets, we release a Corpus of Linguistically Significant Entities (CLSE) annotated by linguist experts. The corpus includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. To demonstrate one possible use of CLSE, we produce an augmented version of the Schema-Guided Dialog Dataset, SGD-CLSE. Using the CLSE's entities and a small number of human translations, we create a linguistically representative NLG evaluation benchmark in three languages: French (high-resource), Marathi (low-resource), and Russian (highly inflected language). We establish quality baselines for neural, template-based, and hybrid NLG systems and discuss the strengths and weaknesses of each approach.