探索对话对话系统的训练训练指标的鲁棒性

论文标题

探索对话对话系统的训练训练指标的鲁棒性

Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

论文作者

Deriu, Jan, Tuggener, Don, von Däniken, Pius, Cieliebak, Mark

论文摘要

本文介绍了一种对抗性测试训练的指标，以评估对话对话系统。该方法利用强化学习来找到从受过训练的指标中获得最佳分数的响应策略。我们将我们的方法应用于最近提出的训练指标。我们发现，它们都容易对我们方法融合的相对简单且明显有缺陷的策略产生的响应给出很高的分数。例如，简单地复制对话上下文以形成响应的部分会产生竞争性分数，甚至优于人类编写的响应。

This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems. The method leverages Reinforcement Learning to find response strategies that elicit optimal scores from the trained metrics. We apply our method to test recently proposed trained metrics. We find that they all are susceptible to giving high scores to responses generated by relatively simple and obviously flawed strategies that our method converges on. For instance, simply copying parts of the conversation context to form a response yields competitive scores or even outperforms responses written by humans.

下载PDF全文

下载文献需遵守相关版权规定

论文标题