论文标题
通过交叉解耦网络(CDN)授权长尾项目推荐
Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)
论文作者
论文摘要
行业推荐系统通常会遭受高度相关的长尾项目分布,其中一小部分项目都会收到大多数用户反馈。这种偏斜损害了推荐质量,尤其是对于物品切片而没有大量用户反馈。尽管学术界已经取得了许多研究进展,但在生产中采用这些方法非常困难,行业中几乎没有改进。一个挑战是这些方法通常会损害整体表现。此外,它们可能很复杂且昂贵的训练和服务。在这项工作中,我们旨在改善尾部项目的建议,同时以更少的培训和服务成本来保持整体性能。我们首先发现,在长尾分布中,用户偏好的预测是偏差的。偏见来自两个角度的培训和服务数据之间的差异:1)项目分布,以及2)用户的喜好给定项目。大多数现有的方法主要尝试从项目分布的角度减少偏见,而忽略了给定项目的用户喜好的差异。这导致了严重的遗忘问题,并导致了次优的性能。 为了解决这个问题,我们设计了一个新颖的交叉解耦网络(CDN)(i)通过专家架构的混合物在项目侧的记忆和概括过程中解除了学习过程; (ii)通过正规双边分支网络将用户样本从不同分布中分离。最后,引入了一个新的适配器来汇总矢量,并将训练的注意力轻松地转移到尾部。广泛的实验结果表明,CDN在基准数据集上的表现明显优于最先进的方法。我们还通过在Google的大规模推荐系统中对CDN进行案例研究来证明其有效性。
Industry recommender systems usually suffer from highly-skewed long-tail item distributions where a small fraction of the items receives most of the user feedback. This skew hurts recommender quality especially for the item slices without much user feedback. While there have been many research advances made in academia, deploying these methods in production is very difficult and very few improvements have been made in industry. One challenge is that these methods often hurt overall performance; additionally, they could be complex and expensive to train and serve. In this work, we aim to improve tail item recommendations while maintaining the overall performance with less training and serving cost. We first find that the predictions of user preferences are biased under long-tail distributions. The bias comes from the differences between training and serving data in two perspectives: 1) the item distributions, and 2) user's preference given an item. Most existing methods mainly attempt to reduce the bias from the item distribution perspective, ignoring the discrepancy from user preference given an item. This leads to a severe forgetting issue and results in sub-optimal performance. To address the problem, we design a novel Cross Decoupling Network (CDN) (i) decouples the learning process of memorization and generalization on the item side through a mixture-of-expert architecture; (ii) decouples the user samples from different distributions through a regularized bilateral branch network. Finally, a new adapter is introduced to aggregate the decoupled vectors, and softly shift the training attention to tail items. Extensive experimental results show that CDN significantly outperforms state-of-the-art approaches on benchmark datasets. We also demonstrate its effectiveness by a case study of CDN in a large-scale recommendation system at Google.