Top-K排名信息选择系统的深层上下文匪徒

论文标题

Top-K排名信息选择系统的深层上下文匪徒

Top-K Ranking Deep Contextual Bandits for Information Selection Systems

论文作者

Freeman, Jade, Rawson, Michael

论文摘要

在当今的技术环境中，信息本质上是丰富的，动态的和异质的。自动过滤和信息的优先级是基于信息是否为目标增加实质性价值之间的区别。上下文多臂强盗已被广泛用于学习过滤内容并根据用户兴趣或相关性确定优先级。学习到级技术可以优化项目上的相关性排名，从而可以相应地选择内容。我们在上下文的多军强盗框架下提出了一种新颖的方法来对Top-K排名。我们使用神经网络对随机奖励函数进行建模，以允许非线性近似学习奖励与上下文之间的关系。我们演示了该方法，并在模拟方案中使用现实世界数据集评估了从实验中学习的性能。经验结果表明，这种方法在奖励结构的复杂性和高维上下文特征的复杂性下表现良好。

In today's technology environment, information is abundant, dynamic, and heterogeneous in nature. Automated filtering and prioritization of information is based on the distinction between whether the information adds substantial value toward one's goal or not. Contextual multi-armed bandit has been widely used for learning to filter contents and prioritize according to user interest or relevance. Learn-to-Rank technique optimizes the relevance ranking on items, allowing the contents to be selected accordingly. We propose a novel approach to top-K rankings under the contextual multi-armed bandit framework. We model the stochastic reward function with a neural network to allow non-linear approximation to learn the relationship between rewards and contexts. We demonstrate the approach and evaluate the the performance of learning from the experiments using real world data sets in simulated scenarios. Empirical results show that this approach performs well under the complexity of a reward structure and high dimensional contextual features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题