论文标题

要了解深点线预测模型的过度拟合现象

Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models

论文作者

Zhang, Zhao-Yu, Sheng, Xiang-Rong, Zhang, Yujing, Jiang, Biye, Han, Shuguang, Deng, Hongbo, Zheng, Bo

论文摘要

深度学习技术已广泛应用于工业推荐系统。但是,在推荐系统中,人们对模型过度问题的关注要少得多,相反,这被认为是深神经网络的关键问题。在点击率率(CTR)预测的背景下,我们观察到一个有趣的单上述过度拟合问题:模型性能在第二个时期开始时表现出巨大的退化。在CTR模型的现实应用中,这种现象已被广泛看到。因此,通常仅通过一个时代训练就可以实现最佳性能。为了了解一两个现象背后的基本因素,我们对从阿里巴巴的展示广告系统收集的生产数据集进行了广泛的实验。结果表明,模型结构,具有快速收敛速率的优化算法以及特征稀疏性与单位现象密切相关。我们还提供了解释这种现象并进行一组概念验证实验的可能假设。我们希望这项工作能够阐明未来的研究培训更多时代以提高表现。

Deep learning techniques have been applied widely in industrial recommendation systems. However, far less attention has been paid to the overfitting problem of models in recommendation systems, which, on the contrary, is recognized as a critical issue for deep neural networks. In the context of Click-Through Rate (CTR) prediction, we observe an interesting one-epoch overfitting problem: the model performance exhibits a dramatic degradation at the beginning of the second epoch. Such a phenomenon has been witnessed widely in real-world applications of CTR models. Thereby, the best performance is usually achieved by training with only one epoch. To understand the underlying factors behind the one-epoch phenomenon, we conduct extensive experiments on the production data set collected from the display advertising system of Alibaba. The results show that the model structure, the optimization algorithm with a fast convergence rate, and the feature sparsity are closely related to the one-epoch phenomenon. We also provide a likely hypothesis for explaining such a phenomenon and conduct a set of proof-of-concept experiments. We hope this work can shed light on future research on training more epochs for better performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源