论文标题

TenRec:推荐系统的大型多功能基准数据集

Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

论文作者

Yuan, Guanghu, Yuan, Fajie, Li, Yudong, Kong, Beibei, Li, Shujie, Chen, Lei, Yang, Min, Yu, Chenyun, Hu, Bo, Li, Zang, Xu, Yu, Qie, Xiaohu

论文摘要

针对推荐系统(RS)的现有基准数据集是小规模创建的,或者涉及非常有限的用户反馈形式。在此类数据集上评估的RS模型通常缺乏用于大规模现实世界应用程序的实用值。在本文中,我们描述了TenRec,这是一种针对RS的新颖且可公开的数据收集,可从四个不同的建议方案记录各种用户反馈。具体来说,TenRec具有以下五个特征:(1)它是大规模的,包含约500万用户和1.4亿个互动; (2)它不仅具有积极的用户反馈,而且具有真正的负面反馈(相对于一级建议); (3)它包含在四种不同情况下的用户和项目; (4)它包含各种类型的用户积极反馈,以点击,喜欢,分享和关注等的形式; (5)它包含除用户ID和项目ID之外的其他功能。我们通过每个任务运行多种经典的基线模型来验证十项不同的推荐任务。 TENREC有可能成为大多数流行推荐任务的有用基准数据集。

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicks, likes, shares, and follows, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源