论文标题
colloql:强大的跨域文本到sql搜索查询
ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries
论文作者
论文摘要
将自然语言的话语转换为可执行的查询是一种有用的技术,可以使将大量数据存储在关系数据库中,可访问更广泛的非专家最终用户。该领域的先前工作主要集中在语言上正确和语义上明确的文本输入上。但是,现实世界中的用户查询通常是简洁,口语和嘈杂的,类似于搜索引擎的输入。在这项工作中,我们介绍了数据增强技术和基于抽样的内容感知的BERT模型(ColloQL),以实现自然语言搜索(NLS)问题的强大文本到SQL模型。由于缺乏评估数据,我们策划了NLS问题的新数据集,并证明了我们方法的功效。 ColloQL的出色性能扩展到了形式良好的文本,在WikisQL数据集上实现了84.9%(逻辑)和90.7%(执行)的准确性,因此,据我们所知,它是我们所知的最高效果,是不使用执行指导的解码的最高性能模型。
Translating natural language utterances to executable queries is a helpful technique in making the vast amount of data stored in relational databases accessible to a wider range of non-tech-savvy end users. Prior work in this area has largely focused on textual input that is linguistically correct and semantically unambiguous. However, real-world user queries are often succinct, colloquial, and noisy, resembling the input of a search engine. In this work, we introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL) to achieve robust text-to-SQL modeling over natural language search (NLS) questions. Due to the lack of evaluation data, we curate a new dataset of NLS questions and demonstrate the efficacy of our approach. ColloQL's superior performance extends to well-formed text, achieving 84.9% (logical) and 90.7% (execution) accuracy on the WikiSQL dataset, making it, to the best of our knowledge, the highest performing model that does not use execution guided decoding.