论文标题
离线指标是否可以预测推荐系统中的在线性能?
Do Offline Metrics Predict Online Performance in Recommender Systems?
论文作者
论文摘要
推荐系统以固有的动态设置运行。过去的建议会影响未来的行为,包括观察到哪些数据点以及用户偏好的变化。但是,在使用真实用户动态的生产系统中进行实验通常是不可行的,并且现有的基于模拟的方法的规模有限。结果,许多最先进的算法旨在解决监督的学习问题,而进步只能通过离线指标来判断。在这项工作中,我们通过评估六个受控的模拟环境中的11个推荐人来研究离线指标在多大程度上预测在线绩效的程度。我们观察到,离线指标与各种环境中的在线性能相关。但是,离线指标的改善导致在线绩效的回报降低。此外,我们观察到,推荐器的排名取决于可用的初始离线数据的数量。我们研究添加探索策略的影响,并观察到它们的有效性与贪婪的建议相比高度取决于建议算法。我们提供本文中描述的环境和推荐人作为Reclab:https://github.com/berkeley-reclab/reclab上的可扩展的现成模拟框架。
Recommender systems operate in an inherently dynamical setting. Past recommendations influence future behavior, including which data points are observed and how user preferences change. However, experimenting in production systems with real user dynamics is often infeasible, and existing simulation-based approaches have limited scale. As a result, many state-of-the-art algorithms are designed to solve supervised learning problems, and progress is judged only by offline metrics. In this work we investigate the extent to which offline metrics predict online performance by evaluating eleven recommenders across six controlled simulated environments. We observe that offline metrics are correlated with online performance over a range of environments. However, improvements in offline metrics lead to diminishing returns in online performance. Furthermore, we observe that the ranking of recommenders varies depending on the amount of initial offline data available. We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm. We provide the environments and recommenders described in this paper as Reclab: an extensible ready-to-use simulation framework at https://github.com/berkeley-reclab/RecLab.