通过细粒度的查询理解改进文本到SQL语义解析

论文标题

通过细粒度的查询理解改进文本到SQL语义解析

Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

论文作者

Wang, Jun, Ng, Patrick, Li, Alexander Hanbo, Jiang, Jiarong, Wang, Zhiguo, Nallapati, Ramesh, Xiang, Bing, Sengupta, Sudipta

论文摘要

关于文本到SQL语义解析的最新研究依赖于解析器本身或基于简单的启发式方法来理解自然语言查询（NLQ）。合成SQL查询时，没有可用的NLQ的明确语义信息，这会导致不良的概括性能。此外，如果没有词汇级别的细粒度查询理解，查询与数据库之间的链接只能依靠模糊的字符串匹配，这会导致实际应用中的次优性能。鉴于这一点，在本文中，我们提出了一个通用的，模块化的神经语义解析框架，该框架基于令牌级的细粒度查询理解。我们的框架由三个模块组成：命名实体识别器（NER），神经实体接头（NEL）和神经语义解析器（NSP）。通过共同建模查询和数据库，NER模型可以分析用户意图并确定查询中的实体。 NEL模型将类型的实体链接到数据库中的模式和单元格值。解析器模型利用可用的语义信息并链接结果并基于动态生成的语法合成树结构的SQL查询。新发布的语义解析数据集（Squall）的实验表明，我们可以在WikiableQuestions（WTQ）测试集上实现56.8％的执行精度，这使最先进的模型的表现高2.7％。

Most recent research on Text-to-SQL semantic parsing relies on either parser itself or simple heuristic based approach to understand natural language query (NLQ). When synthesizing a SQL query, there is no explicit semantic information of NLQ available to the parser which leads to undesirable generalization performance. In addition, without lexical-level fine-grained query understanding, linking between query and database can only rely on fuzzy string match which leads to suboptimal performance in real applications. In view of this, in this paper we present a general-purpose, modular neural semantic parsing framework that is based on token-level fine-grained query understanding. Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural semantic parser (NSP). By jointly modeling query and database, NER model analyzes user intents and identifies entities in the query. NEL model links typed entities to schema and cell values in database. Parser model leverages available semantic information and linking results and synthesizes tree-structured SQL queries based on dynamically generated grammar. Experiments on SQUALL, a newly released semantic parsing dataset, show that we can achieve 56.8% execution accuracy on WikiTableQuestions (WTQ) test set, which outperforms the state-of-the-art model by 2.7%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题