用于生成多个新颖意图检测的高质量数据点的框架

论文标题

用于生成多个新颖意图检测的高质量数据点的框架

A Framework to Generate High-Quality Datapoints for Multiple Novel Intent Detection

论文作者

Mullick, Ankan, Purkayastha, Sukannya, Goyal, Pawan, Ganguly, Niloy

论文摘要

诸如基于语音命令的对话代理之类的系统的特征是预定义的技能或执行用户指定任务的意图。随着时间的流逝，可能会出现更新的意图。但是，可能不会明确宣布较新的意图，需要动态推断。因此，手头有两个重要的任务（a）。确定新的新意图，（b）。注释新意图的数据，以便可以有效地重新训练基础分类器。当同时出现大量新意图并且手动注释预算有限时，这些任务变得特别具有挑战性。在本文中，我们提出了MNID（多个新型意图检测），这是一个基于聚类的框架，可检测以预算的人类注释成本来检测多个新颖意图。各种基准数据集（不同尺寸）的经验结果表明，通过智能使用预算进行注释，MNID在准确性和F1得分方面优于基线方法。

Systems like Voice-command based conversational agents are characterized by a pre-defined set of skills or intents to perform user specified tasks. In the course of time, newer intents may emerge requiring retraining. However, the newer intents may not be explicitly announced and need to be inferred dynamically. Thus, there are two important tasks at hand (a). identifying emerging new intents, (b). annotating data of the new intents so that the underlying classifier can be retrained efficiently. The tasks become specially challenging when a large number of new intents emerge simultaneously and there is a limited budget of manual annotation. In this paper, we propose MNID (Multiple Novel Intent Detection) which is a cluster based framework to detect multiple novel intents with budgeted human annotation cost. Empirical results on various benchmark datasets (of different sizes) demonstrate that MNID, by intelligently using the budget for annotation, outperforms the baseline methods in terms of accuracy and F1-score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题