挑战和隐私的方法保存点击转换预测

论文标题

挑战和隐私的方法保存点击转换预测

Challenges and approaches to privacy preserving post-click conversion prediction

论文作者

O'Brien, Conor, Thiagarajan, Arvind, Das, Sourav, Barreto, Rafael, Verma, Chetan, Hsu, Tim, Neufield, James, Hunt, Jonathan J

论文摘要

通过使用机器学习模型和广告定位的实时拍卖，在线广告通常比离线广告更为个性化。预测转化的可能性（即用户将购买广告产品的概率）的一项特定任务对于针对目标和定价广告的广告生态系统至关重要。当前，这些模型通常是通过观察个人用户行为来训练的，但是，越来越多的监管和技术限制要求具有隐私性的方法。例如，主要平台正在限制在多个应用程序中跟踪单个用户事件的限制，世界各国政府对调节个人数据的使用持稳定的兴趣。广告客户无需接收有关个人用户行为的数据，而是可能会收到保护隐私的反馈，例如由一组用户产生的广告应用程序的安装数量。在本文中，我们从机器学习的角度概述了在线广告生态系统中最近与隐私相关的变化。在此环境中学习转换模型时，我们概述了挑战和约束。我们介绍了一种培训这些模型的新方法，该模型利用了后级信号。我们显示在现实世界数据上使用离线实验表明，它的表现优于仅依靠选择加入数据的模型，并且在没有单个标签可用时会大大减少模型降解。最后，我们讨论了这个不断发展的领域研究的未来研究方向。

Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题