通过稀疏机制移动建模来学习单细胞的因果关系

论文标题

通过稀疏机制移动建模来学习单细胞的因果关系

Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling

论文作者

Lopez, Romain, Tagasovska, Nataša, Ra, Stephen, Cho, Kyunghyn, Pritchard, Jonathan K., Regev, Aviv

论文摘要

潜在变量模型（例如变异自动编码器（VAE））已成为分析生物学数据的首选工具，尤其是在单细胞基因组学领域。剩下的挑战是潜在变量作为定义细胞身份的生物过程的解释性。在生物应用之外，此问题通常被称为学习分解表示。尽管引入了VAE的几种分离促进变体，并应用于单细胞基因组学数据，但该任务已显示出从独立且相同分布的测量值中不可行，没有其他结构。取而代之的是，最近的方法提出了利用非平稳数据以及稀疏的机制移位假设，以便学习具有因果语义的分离表示。在这里，我们将这些方法学进步的应用扩展到用遗传或化学扰动的单细胞基因组数据分析。更确切地说，我们提出了一个深层生成模型的单细胞基因表达数据，每种扰动都被视为针对潜在变量的未知但稀疏子集的随机干预措施。我们将这些方法基于模拟的单细胞数据，以评估其在潜在单元恢复，因果目标识别和偏域概括的性能。最后，我们将这些方法应用于两个现实世界中的大规模基因扰动数据集，并发现利用稀疏机制的模型移动假设在转移学习任务上超过了当代方法。我们使用SCVI-Tools库实施了新的模型和基准，并在https://github.com/genentech/svae上将其作为开源软件发布。

Latent variable models such as the Variational Auto-Encoder (VAE) have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the interpretability of latent variables as biological processes that define a cell's identity. Outside of biological applications, this problem is commonly referred to as learning disentangled representations. Although several disentanglement-promoting variants of the VAE were introduced, and applied to single-cell genomics data, this task has been shown to be infeasible from independent and identically distributed measurements, without additional structure. Instead, recent methods propose to leverage non-stationary data, as well as the sparse mechanism shift assumption in order to learn disentangled representations with a causal semantic. Here, we extend the application of these methodological advances to the analysis of single-cell genomics data with genetic or chemical perturbations. More precisely, we propose a deep generative model of single-cell gene expression data for which each perturbation is treated as a stochastic intervention targeting an unknown, but sparse, subset of latent variables. We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization. Finally, we apply those approaches to two real-world large-scale gene perturbation data sets and find that models that exploit the sparse mechanism shift hypothesis surpass contemporary methods on a transfer learning task. We implement our new model and benchmarks using the scvi-tools library, and release it as open-source software at https://github.com/Genentech/sVAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题