迈向神经基因组组件的一步

论文标题

迈向神经基因组组件的一步

A step towards neural genome assembly

论文作者

Vrček, Lovro, Veličković, Petar, Šikić, Mile

论文摘要

从头基因组组装集中于寻找大量短序列之间的连接，以重建原始基因组。基因组组装的中心问题可以描述为在大型有向图中找到一条哈密顿路径，其限制是应避免数量不明的节点和边缘。但是，由于图和生物学特征中的局部结构，该问题可以简化图简化，其中包括删除冗余信息。在图表表示学习和算法的神经执行方面的最新进展中，在这项工作中，我们使用Max-Aggregator训练MPNN模型，以执行多种算法以简化图形。我们表明，该算法成功地学习了，并且可以缩放到尺寸的图表，其尺寸比训练中使用的算法大20倍。我们还测试了从现实世界基因组数据获得的图 - lambda噬菌体和大肠杆菌的图。

De novo genome assembly focuses on finding connections between a vast amount of short sequences in order to reconstruct the original genome. The central problem of genome assembly could be described as finding a Hamiltonian path through a large directed graph with a constraint that an unknown number of nodes and edges should be avoided. However, due to local structures in the graph and biological features, the problem can be reduced to graph simplification, which includes removal of redundant information. Motivated by recent advancements in graph representation learning and neural execution of algorithms, in this work we train the MPNN model with max-aggregator to execute several algorithms for graph simplification. We show that the algorithms were learned successfully and can be scaled to graphs of sizes up to 20 times larger than the ones used in training. We also test on graphs obtained from real-world genomic data---that of a lambda phage and E. coli.

下载PDF全文

下载文献需遵守相关版权规定

论文标题