论文标题

高效的查询自动完成

Efficient and Effective Query Auto-Completion

论文作者

Gog, Simon, Pibiri, Giulio Ermanno, Venturini, Rossano

论文摘要

查询自动完成(QAC)是现代文本搜索系统的无处不在功能,提出了完成用户键入的查询的可能方法。效率对于使系统在百万级搜索空间中运行时具有实时响应能力至关重要。先前的工作已广泛提倡在紧凑型空间中使用TRIE数据结构进行快速前缀搜索操作。但是,通过前缀进行搜索几乎没有发现功能,因为仅返回了由查询前缀的完成。这可能会对QAC系统的有效性产生负面影响,因此对于Web搜索引擎和电子商务等实际应用,货币损失。在这项工作中,我们描述了在eBay上获得新的QAC系统的实施,并讨论了其与最先进的其他方法有关的效率/有效性。该解决方案基于倒置索引与简洁的数据结构的组合,这是文献中探索的方向较少的。该系统正在替换基于Apache Solr的先前实现,该实现并不总是能够满足所需的服务级别的验证。

Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源