对话状态跟踪中拷贝机制的数据增强

论文标题

对话状态跟踪中拷贝机制的数据增强

Data Augmentation for Copy-Mechanism in Dialogue State Tracking

论文作者

Song, Xiaohui, Zang, Liangjun, Su, Yipeng, Wu, Xing, Han, Jizhong, Hu, Songlin

论文摘要

尽管对话状态跟踪（DST）的几种最先进的方法在几个基准上表现出了有希望的性能，但在可见的插槽值（即在训练集和测试集中都出现的值）与看不见的值（在训练集中发生，但在测试集中没有发生的值）之间仍然存在显着的性能差距。最近，拷贝机制已被广泛用于DST模型来处理看不见的插槽值，该值直接从用户话语中复制插槽值。在本文中，我们旨在找出影响DST的普通拷贝机制模型的概括能力的因素。我们的主要观察结果包括：1）拷贝机制倾向于记住价值观，而不是从上下文中推断出来，这是概括性不令人满意的主要原因； 2）训练集中插槽值的较大多样性增加了看不见的值的性能，但会稍微降低可见值的性能。此外，我们提出了一种简单但有效的数据增强算法来训练拷贝机制模型，该模型通过复制用户话语并用随机生成的字符串替换真实的插槽值来增强输入数据集。用户可以使用两个超参数来实现在可见值和看不见的表演之间的权衡，以及整体绩效和计算成本之间的权衡。在三个广泛使用的数据集（WOZ 2.0，DSTC2和多WOZ 2.0）上的实验结果显示了我们方法的有效性。

While several state-of-the-art approaches to dialogue state tracking (DST) have shown promising performances on several benchmarks, there is still a significant performance gap between seen slot values (i.e., values that occur in both training set and test set) and unseen ones (values that occur in training set but not in test set). Recently, the copy-mechanism has been widely used in DST models to handle unseen slot values, which copies slot values from user utterance directly. In this paper, we aim to find out the factors that influence the generalization ability of a common copy-mechanism model for DST. Our key observations include: 1) the copy-mechanism tends to memorize values rather than infer them from contexts, which is the primary reason for unsatisfactory generalization performance; 2) greater diversity of slot values in the training set increase the performance on unseen values but slightly decrease the performance on seen values. Moreover, we propose a simple but effective algorithm of data augmentation to train copy-mechanism models, which augments the input dataset by copying user utterances and replacing the real slot values with randomly generated strings. Users could use two hyper-parameters to realize a trade-off between the performances on seen values and unseen ones, as well as a trade-off between overall performance and computational cost. Experimental results on three widely used datasets (WoZ 2.0, DSTC2, and Multi-WoZ 2.0) show the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题