论文标题
在信息检索中利用无监督等级融合的层次依赖性结构
Exploiting Hierarchical Dependence Structures for Unsupervised Rank Fusion in Information Retrieval
论文作者
论文摘要
信息检索(IR)中排名融合的目标是从多个搜索结果中传递单个输出列表。通过结合各种IR系统的产出来提高性能是一项艰巨的任务。一个核心是一个事实,即相关性的估计涉及许多非明显因素,从而引起数据之间的非线性相互关系。在信息检索的领域中,对随机变量之间的复杂依赖关系建模的能力已经越来越流行,并且最近已经确认了进一步探索这些数据融合的这些依赖性的需求。 Copulas提供了一个将依赖性结构与边缘分开的框架。受Copulas理论的启发,我们提出了一种基于非代数函数对的嵌套组成,提出了一种新的无监督,动态,非线性,等级融合方法。该模型的依赖性结构是通过以每次发出为基础利用查询文件相关性来量身定制的。我们尝试了Clef Corpora融合3和6检索系统的三个主题集,将我们的方法与CombMNZ技术和其他非线性无监督策略进行了比较。实验表明,我们的融合方法在明确的条件下提高了性能,从而提供了有关线性融合技术与非线性方法相当的性能的情况。
The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations between the data. The ability to model complex dependency relationships between random variables has become increasingly popular in the realm of information retrieval, and the need to further explore these dependencies for data fusion has been recently acknowledged. Copulas provide a framework to separate the dependence structure from the margins. Inspired by the theory of copulas, we propose a new unsupervised, dynamic, nonlinear, rank fusion method, based on a nested composition of non-algebraic function pairs. The dependence structure of the model is tailored by leveraging query-document correlations on a per-query basis. We experimented with three topic sets over CLEF corpora fusing 3 and 6 retrieval systems, comparing our method against the CombMNZ technique and other nonlinear unsupervised strategies. The experiments show that our fusion approach improves performance under explicit conditions, providing insight about the circumstances under which linear fusion techniques have comparable performance to nonlinear methods.