论文标题
通过标签校正延迟反馈建模的渐近公正估计
Asymptotically Unbiased Estimation for Delayed Feedback Modeling via Label Correction
论文作者
论文摘要
减轻延迟反馈问题对于在线广告中的转换率(CVR)预测至关重要。先前使用观察窗口的延迟反馈建模方法平衡了等待准确的标签和消耗新鲜反馈之间的权衡。此外,要估计CVR对新鲜观察到的具有虚假负面的分布的分布,重要性采样被广泛用于减少分布偏见。虽然有效,但我们认为以前的方法在重要性加权期间错误地将假否定样本视为真正的负面,并且尚未完全利用观察到的正样本,从而导致了次优性能。 在这项工作中,我们提出了一种新方法,延迟了反馈建模,并没有偏见的估计(Duduse),该方法旨在分别纠正直接正面,假阴性,真实负面和延迟粒度的延迟阳性样本的重要性权重。具体而言,我们提出了一种两步优化的方法,该方法首先在应用重要性采样之前首先在观察到的负面因素中伪造负面因素的可能性。为了完全利用观察到的分布的立即利用立即阳性,我们进一步开发了一个双分布建模框架,以共同对公正的即时阳性和有偏见的延迟转换进行建模。公共和我们的工业数据集的实验结果验证了变化的优势。代码可在https://github.com/ychen216/defuse.git上找到。
Alleviating the delayed feedback problem is of crucial importance for the conversion rate(CVR) prediction in online advertising. Previous delayed feedback modeling methods using an observation window to balance the trade-off between waiting for accurate labels and consuming fresh feedback. Moreover, to estimate CVR upon the freshly observed but biased distribution with fake negatives, the importance sampling is widely used to reduce the distribution bias. While effective, we argue that previous approaches falsely treat fake negative samples as real negative during the importance weighting and have not fully utilized the observed positive samples, leading to suboptimal performance. In this work, we propose a new method, DElayed Feedback modeling with UnbiaSed Estimation, (DEFUSE), which aim to respectively correct the importance weights of the immediate positive, the fake negative, the real negative, and the delay positive samples at finer granularity. Specifically, we propose a two-step optimization approach that first infers the probability of fake negatives among observed negatives before applying importance sampling. To fully exploit the ground-truth immediate positives from the observed distribution, we further develop a bi-distribution modeling framework to jointly model the unbiased immediate positives and the biased delay conversions. Experimental results on both public and our industrial datasets validate the superiority of DEFUSE. Codes are available at https://github.com/ychen216/DEFUSE.git.