论文标题
基于变压器的用户满意度预测Dueros的主动互动机制
A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS
论文作者
论文摘要
最近,口语对话系统已被广泛部署在各种应用程序中,为大量最终用户服务。一个普遍的问题是,由于嘈杂的话语,语义误解或缺乏知识而导致的错误使真实系统难以正确响应,这可能会导致不令人满意的用户体验。为了避免这种情况,我们考虑了一种主动的交互作用机制,在该机制中,系统在将其提供给用户之前可以预测用户对候选响应的满意度。如果用户不太可能根据预测来满足用户,则系统将向用户提出一个合适的问题来确定用户的真实意图,而不是直接提供响应。通过与用户的互动,系统可以为用户提供更好的响应。以前预测用户满意度的模型不适用于Dueros,这是一个大规模的商业对话系统。它们基于手工制作的功能,因此几乎无法在对话的多个转弯中学习数百万对话和时间依赖的复杂模式。此外,它们在带有足够标签的基准数据集上进行了培训和评估,这些标签在商业对话系统中很昂贵。为了面对这些挑战,我们提出了一条管道来预测用户满意度,以帮助Dueros决定是否在每个回合中要求澄清。具体而言,我们建议首先生成大量弱标签,然后训练基于变压器的模型,以预测用户对这些弱标签的满意度。从经验上讲,我们在Dueros上部署和评估了模型,并观察到用户满意度预测准确性的相对提高19%,用户体验相对改善2.3%。
Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Empirically, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.