推荐系统中极端模型个性化的云协调数据增强的设备学习

论文标题

On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme Model Personalization in Recommender Systems

论文作者

Gu, Renjie, Niu, Chaoyue, Yan, Yikai, Wu, Fan, Tang, Shaojie, Jia, Rongfeng, Lyu, Chengfei, Chen, Guihai

论文摘要

数据异质性是推荐系统的内在属性，使对云上的全局数据进行了训练的模型，云是行业中的主流，对每个用户的本地数据分布而言并非最佳。为了处理数据异质性，使用设备学习的模型个性化是一个潜在的解决方案。但是，使用用户少量本地样品的智障训练会产生严重的过度拟合并破坏模型的概括能力。在这项工作中，我们提出了一个名为CODA的新设备云协作学习框架，以打破纯粹基于云的学习和设备学习的困境。 CODA的关键原则是从云的全局池检索类似的样本，以增强每个用户的本地数据集以训练推荐模型。具体而言，在云上的粗粒样品匹配之后，在每个设备上进一步训练了个性化样本分类器，以进行细粒样品过滤，这可以学习本地数据分布和外部数据分布之间的边界。我们还建立了一个端到端管道，以支持云和每个设备之间的数据，模型，计算和控制的流量。我们已经在移动淘宝精的建议方案中部署了尾声。在线A/B测试结果表明，在没有模型个性化的基于云的学习的情况下，CODA在没有数据增强的情况下进行了显着改善。实际设备上的高架测试演示了CODA中设备任务的计算，存储和通信效率。

Data heterogeneity is an intrinsic property of recommender systems, making models trained over the global data on the cloud, which is the mainstream in industry, non-optimal to each individual user's local data distribution. To deal with data heterogeneity, model personalization with on-device learning is a potential solution. However, on-device training using a user's small size of local samples will incur severe overfitting and undermine the model's generalization ability. In this work, we propose a new device-cloud collaborative learning framework, called CoDA, to break the dilemmas of purely cloud-based learning and on-device learning. The key principle of CoDA is to retrieve similar samples from the cloud's global pool to augment each user's local dataset to train the recommendation model. Specifically, after a coarse-grained sample matching on the cloud, a personalized sample classifier is further trained on each device for a fine-grained sample filtering, which can learn the boundary between the local data distribution and the outside data distribution. We also build an end-to-end pipeline to support the flows of data, model, computation, and control between the cloud and each device. We have deployed CoDA in a recommendation scenario of Mobile Taobao. Online A/B testing results show the remarkable performance improvement of CoDA over both cloud-based learning without model personalization and on-device training without data augmentation. Overhead testing on a real device demonstrates the computation, storage, and communication efficiency of the on-device tasks in CoDA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题