嘈杂标签的鲁棒学习的隐式反馈数据的采样器设计

论文标题

嘈杂标签的鲁棒学习的隐式反馈数据的采样器设计

Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning

论文作者

Yu, Wenhui, Qin, Zheng

论文摘要

建议数据在推荐中进行了广泛的探索，因为它易于收集且通常适用。但是，预测用户对隐式反馈数据的偏好是一项具有挑战性的任务，因为我们只能观察到积极的（投票）样本和未投票的样本。很难区分阴性样本和未标记的阳性样品与未投票样本。现有作品，例如贝叶斯个性化排名（BPR），将未投票的项目作为负样本统一样本，因此遇到了关键的嘈杂标签问题。为了解决这一差距，我们根据隐式反馈数据设计了一个基于嘈杂标签的鲁棒学习的自适应采样器。为了提出问题，我们首先通过最大的可能性估计，首先引入贝叶斯角度优化（BPO）来学习模型，例如矩阵分解（MF）。我们预测用户对模型的偏好，并通过最大程度地提高观察到的数据标签的可能性来学习它，即，用户更喜欢她的正样本，并且对她未投票的样本没有兴趣。但是，实际上，用户可能对她的一些未投票样本有兴趣，这些样本确实是被标记为负面样本的样本。然后，我们考虑了这些嘈杂标签的风险，并提出了一个嘈杂的标签鲁棒BPO（NBPO）。 NBPO还可以最大程度地提高观察可能性，同时通过基于贝叶斯定理的标签翻转的可能性连接用户的喜好并观察到标签。在NBPO中，用户更喜欢她的真正积极样本，并且对她的真正负样本没有任何兴趣，因此优化质量得到了显着提高。在两个公共现实世界数据集上进行的广泛实验表明，我们提出的优化方法的显着改善。

Implicit feedback data is extensively explored in recommendation as it is easy to collect and generally applicable. However, predicting users' preference on implicit feedback data is a challenging task since we can only observe positive (voted) samples and unvoted samples. It is difficult to distinguish between the negative samples and unlabeled positive samples from the unvoted ones. Existing works, such as Bayesian Personalized Ranking (BPR), sample unvoted items as negative samples uniformly, therefore suffer from a critical noisy-label issue. To address this gap, we design an adaptive sampler based on noisy-label robust learning for implicit feedback data. To formulate the issue, we first introduce Bayesian Point-wise Optimization (BPO) to learn a model, e.g., Matrix Factorization (MF), by maximum likelihood estimation. We predict users' preferences with the model and learn it by maximizing likelihood of observed data labels, i.e., a user prefers her positive samples and has no interests in her unvoted samples. However, in reality, a user may have interests in some of her unvoted samples, which are indeed positive samples mislabeled as negative ones. We then consider the risk of these noisy labels, and propose a Noisy-label Robust BPO (NBPO). NBPO also maximizes the observation likelihood while connects users' preference and observed labels by the likelihood of label flipping based on the Bayes' theorem. In NBPO, a user prefers her true positive samples and shows no interests in her true negative samples, hence the optimization quality is dramatically improved. Extensive experiments on two public real-world datasets show the significant improvement of our proposed optimization methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题