论文标题
SQL的正式化用零
A Formalization of SQL with Nulls
论文作者
论文摘要
SQL是世界上最受欢迎的声明语言,构成了数十亿美元的数据库行业的基础。尽管SQL已被标准化,但完整的标准是基于模棱两可的自然语言而不是正式规范。商业SQL实现以不同的方式解释标准,因此,给定相同的输入数据,相同的查询可以根据其运行的SQL系统产生不同的结果。即使对于特定系统,SQL的所有广泛使用功能的机械检查也是一个开放的问题。缺乏精心理解的正式语义使得很难验证数据库实现的健全性。 尽管过去设计了SQL片段的正式语义,但它们通常不支持套装和包装操作,侧向连接,嵌套子征服以及至关重要的零值。零值以深刻的方式使SQL的语义复杂化,类似于其他编程语言中的无指针或副作用。由于某些SQL查询在没有零值的情况下是等效的,但是当应用于包含不完整数据的表时产生不同的结果,因此忽略零值的语义能够证明在现实数据库中不符合不明智的质量等价。 仅最近提出了支持上述所有功能的SQL的正式语义。在本文中,我们报告了我们的SQL语义机械化,涵盖了集合/袋操作,侧向连接,嵌套的子征服和编写COQ证明助手编写的Nulls,并描述了关键的元心疗法的验证。此外,我们能够使用相同的框架将平面关系的语义(无数值)形式化,并将其正常形式的经认证翻译显示为SQL。
SQL is the world's most popular declarative language, forming the basis of the multi-billion-dollar database industry. Although SQL has been standardized, the full standard is based on ambiguous natural language rather than formal specification. Commercial SQL implementations interpret the standard in different ways, so that, given the same input data, the same query can yield different results depending on the SQL system it is run on. Even for a particular system, mechanically checked formalization of all widely-used features of SQL remains an open problem. The lack of a well-understood formal semantics makes it very difficult to validate the soundness of database implementations. Although formal semantics for fragments of SQL were designed in the past, they usually did not support set and bag operations, lateral joins, nested subqueries, and, crucially, null values. Null values complicate SQL's semantics in profound ways analogous to null pointers or side-effects in other programming languages. Since certain SQL queries are equivalent in the absence of null values, but produce different results when applied to tables containing incomplete data, semantics which ignore null values are able to prove query equivalences that are unsound in realistic databases. A formal semantics of SQL supporting all the aforementioned features was only proposed recently. In this paper, we report about our mechanization of SQL semantics covering set/bag operations, lateral joins, nested subqueries, and nulls, written in the Coq proof assistant, and describe the validation of key metatheoretic properties. Additionally, we are able to use the same framework to formalize the semantics of a flat relational calculus (with null values), and show a certified translation of its normal forms into SQL.