论文标题
从薄弱的演示中学习对话政策
Learning Dialog Policies from Weak Demonstrations
论文作者
论文摘要
深度强化学习是一种训练对话框经理的有前途的方法,但是当前的方法在多域对话系统的庞大状态和行动空间中遇到了困难。在示范中的深入Q学习(DQFD)的基础上,这是一种在艰难的Atari游戏中得分很高的算法,我们利用对话框数据来指导代理商成功响应用户的请求。我们使用标记,标记标记的甚至未标记的数据来培训专家演示者,对所需数据的假设逐渐减少。我们介绍了加强的微调学习,这是DQFD的扩展,使我们能够克服数据集和环境之间的域间隙。在具有挑战性的多域对话框系统框架中进行的实验验证了我们的方法,即使在室外数据进行培训时,也获得了高成功率。
Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.