论文标题

从嘈杂的实验中重建超级树

Reconstructing Ultrametric Trees from Noisy Experiments

论文作者

Arunachaleswaran, Eshwar Ram, De, Anindya, Kannan, Sampath

论文摘要

重建进化树或系统发育的问题在计算生物学中引起了极大的兴趣。这个问题的一个流行模型假设我们得到了未知二进制树的一组叶子(当前物种),以及在叶子的三倍(a,b,c)上的“实验”的结果,这些叶子的结果与最不含最深的祖先返回了这对。如果假定树是超级测量的(即,所有根叶路径的长度相同),则可以将实验视为返回最接近的叶子。在此模型中,有效的算法是树的重建。 实际上,由于运行这些“实验”的数据本身是由随机进化过程产生的,因此这些实验很嘈杂。在所有合理的进化模型中,如果导致叶子的分支在树上非常接近的祖先彼此分离的三重分支,则实验的结果应接近均匀的随机。在此激励的情况下,我们考虑了一个模型,其中任何三重噪声仅取决于三个成对距离(称为基于距离的噪声)。 我们的结果如下:1。假设未知树中每个边的长度至少为$ \ tilde {o}(\ frac {1} {\ sqrt n})$ root路径长度的分数。然后,我们给出了一种有效的算法,以重建树木的拓扑结构,以作为广泛的基于距离的噪声模型系列。此外,我们表明,如果边缘渐近短,则拓扑重建在理论上是不可能的。 2。此外,对于基于特定的基于距离的噪声模型(我们称为均匀的噪声模型),我们表明边缘权重也可以在边缘长度的相同定量下限下大致重建。

The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are very close to each other in the tree, the result of the experiment should be close to uniformly random. Motivated by this, we consider a model where the noise on any triple is just dependent on the three pairwise distances (referred to as distance based noise). Our results are the following: 1. Suppose the length of every edge in the unknown tree is at least $\tilde{O}(\frac{1}{\sqrt n})$ fraction of the length of a root-leaf path. Then, we give an efficient algorithm to reconstruct the topology of the tree for a broad family of distance-based noise models. Further, we show that if the edges are asymptotically shorter, then topology reconstruction is information-theoretically impossible. 2. Further, for a specific distance-based noise model--which we refer to as the homogeneous noise model--we show that the edge weights can also be approximately reconstructed under the same quantitative lower bound on the edge lengths.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源