论文标题

蛋白质序列和结构与均等翻译共同设计

Protein Sequence and Structure Co-Design with Equivariant Translation

论文作者

Shi, Chence, Wang, Chuanrui, Lu, Jiarui, Zhong, Bozitao, Tang, Jian

论文摘要

蛋白质是大分子在所有生物体中发挥重要作用的大分子。在生物工程领域,设计具有特定结构和所需功能的新型蛋白质一直是长期以来的挑战。现有方法使用自回归模型或扩散模型产生蛋白质序列和结构,两者都遭受了高推理成本。在本文中,我们提出了一种能够蛋白质序列和结构共设计的新方法,该方法基于先验的上下文特征,将蛋白质序列和结构既从随机初始化中都转化为所需状态。我们的模型由一个三角感知的编码器组成,该编码器的几何约束和相互作用来自上下文特征,以及一个旋转 - 翻译等效解码器,该解码器会相互依存地翻译蛋白质序列和结构。值得注意的是,在每个翻译步骤中,所有蛋白质氨基酸均以一击更新,这大大加速了推理过程。跨多个任务的实验结果表明,我们的模型的表现要优于先前的最先进基线,并且可以在序列和结构上都设计高保真度的蛋白质,而基于抽样的方法的运行时间级较小。

Proteins are macromolecules that perform essential functions in all living organisms. Designing novel proteins with specific structures and desired functions has been a long-standing challenge in the field of bioengineering. Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models, both of which suffer from high inference costs. In this paper, we propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state from random initialization, based on context features given a priori. Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features, and a roto-translation equivariant decoder that translates protein sequence and structure interdependently. Notably, all protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process. Experimental results across multiple tasks show that our model outperforms previous state-of-the-art baselines by a large margin, and is able to design proteins of high fidelity as regards both sequence and structure, with running time orders of magnitude less than sampling-based methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源