论文标题

通过利用残基保护和协同进化来对齐生物序列

Aligning biological sequences by exploiting residue conservation and coevolution

论文作者

Muntoni, Anna Paola, Pagnani, Andrea, Weigt, Martin, Zamponi, Francesco

论文摘要

核苷酸(用于DNA和RNA)或氨基酸(用于蛋白质)的序列是生物学中的中心对象。最重要的计算问题之一是序列比对,即以这种方式识别相似区域,检测序列之间的进化关系并预测生物分子结构和功能的进化关系。这通常是通过轮廓模型来解决的,该模型捕获了序列中的位置特异性,但假设不同位置的独立演变。在过去的几年中,已经很好地确定,不同氨基酸位置的协同进化对于维持三维结构和功能至关重要。基于反统计物理的建模方法可以按顺序集合捕获共进行进化信号。现在它们被广泛用于预测蛋白质结构,蛋白质 - 蛋白质相互作用和突变景观。在这里,我们提出了DCALIGN,这是一种基于近似消息通话策略的有效比对算法,该策略能够克服概况模型的局限性,以一般方式包括位置之间的协同进化,因此可以普遍适用于蛋白质和RNA序列序列,而无需使用互补的结构信息。使用良好控制的模拟数据以及实际蛋白质和RNA序列仔细探索了DCALIGN的潜力。

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position-specificities like conservation in sequences, but assume an independent evolution of different positions. Over the last years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles; and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源