“在说话之前思考”：通过计划单项对话框来改善多进攻对话策略

论文标题

“在说话之前思考”：通过计划单项对话框来改善多进攻对话策略

"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs

论文作者

Zhang, Shuo, Zhao, Junzhou, Wang, Pinghui, Li, Yu, Huang, Yi, Feng, Junlan

论文摘要

每回合生成多个原子对话框操作的多动作对话框策略（MADP）已被广泛应用于以任务为导向的对话框系统，以提供表达和高效的系统响应。现有的MADP模型通常模仿标记的多动作对话框样本中的动作组合。由于数据限制，它们概括为看不见的对话框流动。虽然可以应用互动学习和强化学习算法来合并真实用户和用户模拟器的外部数据源，但它们会付出大量的手动努力来构建和遭受不稳定性的困扰。为了解决这些问题，我们提出了计划增强对话策略（PEDP），这是一个新颖的多任务学习框架，可以学习单一操作对话框动态以增强多进攻预测。我们的PEDP方法采用基于模型的计划来想象要表达的内容，然后在通过模拟单一操作对话框来决定当前响应之前。多沃兹数据集的实验结果表明，我们完全监督的基于学习的方法达到了90.6％的稳定任务成功率，与最新方法相比，提高了3％。

Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While interactive learning and reinforcement learning algorithms can be applied to incorporate external data sources of real users and user simulators, they take significant manual effort to build and suffer from instability. To address these issues, we propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics to enhance multi-action prediction. Our PEDP method employs model-based planning for conceiving what to express before deciding the current response through simulating single-action dialogs. Experimental results on the MultiWOZ dataset demonstrate that our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题