早期现代印刷版印刷分析的概率生成模型

论文标题

早期现代印刷版印刷分析的概率生成模型

A Probabilistic Generative Model for Typographical Analysis of Early Modern Printing

论文作者

Goyal, Kartik, Dyer, Chris, Warren, Christopher, G'Sell, Max, Berg-Kirkpatrick, Taylor

论文摘要

我们提出了一个深层且可解释的概率生成模型，以分析印刷的早期现代文档中的字形形状。在存在多个混杂的方差来源的情况下，我们将重点放在聚类的提取的字形图像中。我们的方法介绍了一个神经编辑器模型，该模型首先通过可插入的潜在变量产生了充分理解的印刷现象，例如来自模板参数的空间扰动，然后通过产生非变形的潜在矢量来修改结果，负责使墨水变化，抖动，抖动，与档案中的噪音以及与档案的噪音以及与其他不适的现代现代相关的噪音。至关重要的是，通过引入一个推理网络，其输入仅限于观察和解释化模板之间的视觉残差，我们能够控制和隔离矢量值的潜在可变量捕获的内容。我们表明，我们的方法优于刚性可解释的聚类基线（眼）和过于刺激性的深层生成模型（VAE），在混合文件中完全无监督发现字体的任务。

We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in printed Early Modern documents. We focus on clustering extracted glyph images into underlying templates in the presence of multiple confounding sources of variance. Our approach introduces a neural editor model that first generates well-understood printing phenomena like spatial perturbations from template parameters via interpertable latent variables, and then modifies the result by generating a non-interpretable latent vector responsible for inking variations, jitter, noise from the archiving process, and other unforeseen phenomena associated with Early Modern printing. Critically, by introducing an inference network whose input is restricted to the visual residual between the observation and the interpretably-modified template, we are able to control and isolate what the vector-valued latent variable captures. We show that our approach outperforms rigid interpretable clustering baselines (Ocular) and overly-flexible deep generative models (VAE) alike on the task of completely unsupervised discovery of typefaces in mixed-font documents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题