论文标题
计算基因树的概率与多种物种中的物种树一致
Computing the probability of gene trees concordant with the species tree in the multispecies coalescent
论文作者
论文摘要
多物种合并过程对几种物种采样的基因的族谱关系进行了建模,从而实现了有关现象的有用预测,例如由于谱系分类不完全,基因树与物种系统发育之间的不一致。相反,大量基因树的知识可以告知我们该物种系统发育的几个方面,例如其拓扑结构和祖先种群的大小。在这种情况下,一个基本的开放问题是如何有效计算物种系统发育的基因树拓扑的概率。尽管已经提出了许多针对此任务的算法,但它们要么产生近似结果,要么确切地将其扩展到大型数据集。在本文中,我们为基因树拓扑概率的精确有效计算提供了一些进展。我们提供了一种新算法,鉴于一个物种树和每个物种采样的基因数量,可以计算出基因树拓扑与物种树一致的概率。此外,我们提供了一种算法,该算法计算与物种树一致的任何特定基因树拓扑的概率。两种算法在多项式时间内运行,并且已在Python中实现。实验表明,他们能够在数分钟到几个小时内对数千个基因进行采样的数据集进行分析。
The multispecies coalescent process models the genealogical relationships of genes sampled from several species, enabling useful predictions about phenomena such as the discordance between the gene tree and the species phylogeny due to incomplete lineage sorting. Conversely, knowledge of large collections of gene trees can inform us about several aspects of the species phylogeny, such as its topology and ancestral population sizes. A fundamental open problem in this context is how to efficiently compute the probability of a gene tree topology, given the species phylogeny. Although a number of algorithms for this task have been proposed, they either produce approximate results, or, when they are exact, they do not scale to large data sets. In this paper, we present some progress towards exact and efficient computation of the probability of a gene tree topology. We provide a new algorithm that, given a species tree and the number of genes sampled for each species, calculates the probability that the gene tree topology will be concordant with the species tree. Moreover, we provide an algorithm that computes the probability of any specific gene tree topology concordant with the species tree. Both algorithms run in polynomial time and have been implemented in Python. Experiments show that they are able to analyse data sets where thousands of genes are sampled, in a matter of minutes to hours.