论文标题

统计模型的代数和机器表示

The algebra and machine representation of statistical models

论文作者

Patterson, Evan

论文摘要

随着开放科学和开源的双重运动将科学过程的份额更大,这是对科学本身的元科学研究(包括数据科学和统计数据)的新机会。未来的科学可能会看到机器在处理,组织甚至创造科学知识中发挥着积极作用。为了使这一可能,必须进行大型的工程工作,以将科学工件转变为有用的计算资源,并且必须在科学理论,模型,实验和数据的组织中取得概念上的进步。 该论文采取了数字化和系统化数据科学,统计模型和数据分析的两个主要工件的步骤。使用代数的工具,尤其是分类逻辑,在统计和逻辑中的模型之间进行了一个精确的类比,从而使统计模型在逻辑意义上可以看作是理论的模型。统计理论是代数结构,可以适合机器表示,并配备了形式化不同统计方法之间关系的形态。从数学转变为工程学,以Python或R程序的形式创建用于创建数据分析的机器表示的软件系统。该表示旨在捕获数据分析的语义,而与实施的编程语言和库无关。

As the twin movements of open science and open source bring an ever greater share of the scientific process into the digital realm, new opportunities arise for the meta-scientific study of science itself, including of data science and statistics. Future science will likely see machines play an active role in processing, organizing, and perhaps even creating scientific knowledge. To make this possible, large engineering efforts must be undertaken to transform scientific artifacts into useful computational resources, and conceptual advances must be made in the organization of scientific theories, models, experiments, and data. This dissertation takes steps toward digitizing and systematizing two major artifacts of data science, statistical models and data analyses. Using tools from algebra, particularly categorical logic, a precise analogy is drawn between models in statistics and logic, enabling statistical models to be seen as models of theories, in the logical sense. Statistical theories, being algebraic structures, are amenable to machine representation and are equipped with morphisms that formalize the relations between different statistical methods. Turning from mathematics to engineering, a software system for creating machine representations of data analyses, in the form of Python or R programs, is designed and implemented. The representations aim to capture the semantics of data analyses, independent of the programming language and libraries in which they are implemented.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源