对抗性的多模式表示学习，用于点击率预测

论文标题

对抗性的多模式表示学习，用于点击率预测

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

论文作者

Li, Xiang, Wang, Chao, Tan, Jiwei, Zeng, Xiaoyi, Ou, Dan, Zheng, Bo

论文摘要

为了提高用户体验和业务效率，点击率（CTR）预测一直是电子商务中最重要的任务之一。尽管已经提出了广泛的CTR预测模型，但考虑到电子商务中的项目通常包含多种异质方式，从多模式特征中学习良好的项目表示良好。以前的作品要么加入多种模态特征，这相当于给每个模态赋予固定的重要性重量。或通过注意机制等技术学习不同项目的不同方式的动态权重。但是，一个问题是通常存在多种模式的共同冗余信息。通过使用冗余信息计算出的不同方式的动态权重可能无法正确反映每种模式的不同重要性。为了解决这个问题，我们通过以不同的方式考虑模式特异性和模态不变的特征来探索模式的互补性和冗余性。我们为CTR预测任务提出了一个新颖的多模式对抗表示网络（MARN）。多模式注意网络首先根据其模式特异性特征来计算每个项目的多种模式的权重。然后，多模式的对手网络学习了引入双歧歧视策略的模态不变表示。最后，我们通过组合模式特异性和模态不变表示来实现多模式项目表示。我们对公共和工业数据集进行了广泛的实验，拟议的方法始终取得了最先进的方法的显着改进。此外，该方法已在运营的电子商务系统中部署，在线A/B测试进一步证明了有效性。

For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题