论文标题

NeuralQA:大型数据集上的一个可用库(上下文查询扩展 + BERT)

NeuralQA: A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large Datasets

论文作者

Dibia, Victor

论文摘要

现有的问答工具(QA)面临限制其在实践中使用的挑战。它们可以很复杂,可以与现有基础架构进行设置或集成,不提供可配置的交互式接口,并且不涵盖经常构成QA管道的全套子任务(查询扩展,检索,读取,阅读和说明/sensemaking)。为了帮助解决这些问题,我们介绍了Neuralqa - 大型数据集上的质量检查库。 NeuralQA与现有的基础架构(例如,使用HuggingFace Transformers API训练的eLasticsearch Instances和Reader模型)很好地集成在一起,并为QA子任务提供了有用的默认值。它使用蒙版语言模型(MLM)以及相关的摘要(RELSNIP)介绍和实施上下文查询扩展(CQE) - 将大型文档凝结到较小段落中的方法,这些方法可以由文档读取器模型迅速处理。最后,它提供了一个灵活的用户界面,以支持用于研究探索的工作流(例如,基于梯度的解释可视化以支持模型行为的定性检查)和大规模搜索部署。 NeuralQA的代码和文档可作为GITHUB(https://github.com/victordibia/neuralqa} {github)的开源。

Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation/sensemaking). To help address these issues, we introduce NeuralQA - a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets (RelSnip) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github (https://github.com/victordibia/neuralqa}{Github).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源