信息：在线人类评估框架，用于对话推荐

论文标题

信息：在线人类评估框架，用于对话推荐

INFACT: An Online Human Evaluation Framework for Conversational Recommendation

论文作者

Manzoor, Ahtsham, jannach, Dietmar

论文摘要

会话推荐系统（CRS）是交互式代理，可通过多转交谈来支持其用户与建议相关的目标。通常，可以在各个维度上评估CRS。今天的CRS主要依靠离线（计算）措施来评估其算法与不同基线相比的性能。但是，离线措施可能存在局限性，例如，当比较新生成的响应与地面真理的指标与人类的看法无关时，因为在给定的对话情况下，各种替代性产生的响应可能太合适了。因此，对基于机器学习的CRS模型的当前研究认识到人类在评估过程中的重要性，知道纯离线测量可能不足以评估CRS（例如CRS）的高度交互系统。

Conversational recommender systems (CRS) are interactive agents that support their users in recommendation-related goals through multi-turn conversations. Generally, a CRS can be evaluated in various dimensions. Today's CRS mainly rely on offline(computational) measures to assess the performance of their algorithms in comparison to different baselines. However, offline measures can have limitations, for example, when the metrics for comparing a newly generated response with a ground truth do not correlate with human perceptions, because various alternative generated responses might be suitable too in a given dialog situation. Current research on machine learning-based CRS models therefore acknowledges the importance of humans in the evaluation process, knowing that pure offline measures may not be sufficient in evaluating a highly interactive system like a CRS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题