文字：加强与残留演员的顺序推荐的长期参与

论文标题

文字：加强与残留演员的顺序推荐的长期参与

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

论文作者

Xue, Wanqi, Cai, Qingpeng, Zhan, Ruohan, Zheng, Dong, Jiang, Peng, Gai, Kun, An, Bo

论文摘要

长期参与度优于立即参与顺序建议，因为它直接影响了产品运营指标，例如日常活跃用户（DAUS）和停留时间。同时，加固学习（RL）被广泛认为是优化连续建议中长期参与的有希望的框架。但是，由于昂贵的在线互动，RL算法在优化长期参与时很难执行国家行动价值估计，探索和特征提取。在本文中，我们提出了与在线服务策略相比，寻求一项接近但要好的政策的公寓。通过这种方式，我们可以在学习的政策附近收集足够的数据，以便可以正确估算国家行动值，并且无需进行在线探索。 Cant通过首先重建在线行为，然后通过残留参与者改进该策略，从而优化策略。为了提取长期信息，《疾病》利用两个信息理论的正规化器来确认特征的表现力和简洁性。我们在基准数据集和一个大规模的工业数据集上进行实验，该数据集由数千万的推荐请求组成。实验结果表明，在各种长期参与优化任务中，我们的方法大大优于最先进的基线。

Long-term engagement is preferred over immediate engagement in sequential recommendation as it directly affects product operational metrics such as daily active users (DAUs) and dwell time. Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation. However, due to expensive online interactions, it is very difficult for RL algorithms to perform state-action value estimation, exploration and feature extraction when optimizing long-term engagement. In this paper, we propose ResAct which seeks a policy that is close to, but better than, the online-serving policy. In this way, we can collect sufficient data near the learned policy so that state-action values can be properly estimated, and there is no need to perform online exploration. ResAct optimizes the policy by first reconstructing the online behaviors and then improving it via a Residual Actor. To extract long-term information, ResAct utilizes two information-theoretical regularizers to confirm the expressiveness and conciseness of features. We conduct experiments on a benchmark dataset and a large-scale industrial dataset which consists of tens of millions of recommendation requests. Experimental results show that our method significantly outperforms the state-of-the-art baselines in various long-term engagement optimization tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题