神经数据库

论文标题

Neural Databases

论文作者

Thorne, James, Yazdani, Majid, Saeidi, Marzieh, Silvestri, Fabrizio, Riedel, Sebastian, Halevy, Alon

论文摘要

近年来，神经网络在长期以来的AI问题上表现出了令人印象深刻的表现，尤其是从自然语言文本中回答问题。这些进步提出了一个问题，即是否可以将它们扩展到我们可以放松数据库管理的基本假设的地步，即，我们的数据被表示为预定型模式的字段。本文提出了回答该问题的第一步。我们描述了NeuralDB，这是一个没有预定义模式的数据库系统，其中以自然语言给出了更新和查询。我们开发基于天然语言处理方法所提供的原始方法的基础。首先，我们证明，在核心方面，最近由预训练的语言模型提供支持的NLP变形金刚，如果给出了精确的项目 - 如果给出了相关事实的确切集合，则可以回答选择项目。但是，它们不能扩展到非平凡的数据库，也无法执行聚合查询。基于这些发现，我们描述了一个神经db架构，该架构并行运行多个神经SPJ运算符，每个架构都带有一组数据库句子，可以产生查询的答案之一。如果需要，这些操作员的结果被送给聚合操作员。我们描述了一种算法，该算法学会了如何创建适当的事实集，这些事实被馈入每个神经SPJ操作员。重要的是，该算法可以由神经SPJ操作员本身训练。我们在实验中验证了神经DB及其组件的准确性，表明我们可以以非常高的精度回答数千个句子的查询。

In recent years, neural networks have shown impressive performance gains on long-standing AI problems, and in particular, answering queries from natural language text. These advances raise the question of whether they can be extended to a point where we can relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema. This paper presents a first step in answering that question. We describe NeuralDB, a database system with no pre-defined schema, in which updates and queries are given in natural language. We develop query processing techniques that build on the primitives offered by the state of the art Natural Language Processing methods. We begin by demonstrating that at the core, recent NLP transformers, powered by pre-trained language models, can answer select-project-join queries if they are given the exact set of relevant facts. However, they cannot scale to non-trivial databases and cannot perform aggregation queries. Based on these findings, we describe a NeuralDB architecture that runs multiple Neural SPJ operators in parallel, each with a set of database sentences that can produce one of the answers to the query. The result of these operators is fed to an aggregation operator if needed. We describe an algorithm that learns how to create the appropriate sets of facts to be fed into each of the Neural SPJ operators. Importantly, this algorithm can be trained by the Neural SPJ operator itself. We experimentally validate the accuracy of NeuralDB and its components, showing that we can answer queries over thousands of sentences with very high accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题