论文标题
Sidechainnet:用于机器学习的全原子蛋白结构数据集
SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning
论文作者
论文摘要
尽管最近在深度学习方法方面取得了进步的蛋白质结构预测和表示,但很少关注蛋白质主链和侧chain结构信息的同时包含和预测。我们提出了Sidechainnet,这是一个直接扩展蛋白网数据集的新数据集。 Sidechainnet包括角度和原子坐标信息,能够描述每种蛋白质结构的所有重原子。在本文中,我们提供了有关蛋白质结构数据可用性和蛋白网的重要性的背景信息。此后,我们主张通过Sidechainnet可能将Sidechain信息包含在内,描述我们组织Sidechainnet的过程,并提供一个软件包(https://github.com/jonathanking/sidechainnet),以进行数据操纵和通过机器学习模型进行数据操作和培训。
Despite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure. In this paper, we provide background information on the availability of protein structure data and the significance of ProteinNet. Thereafter, we argue for the potentially beneficial inclusion of sidechain information through SidechainNet, describe the process by which we organize SidechainNet, and provide a software package (https://github.com/jonathanking/sidechainnet) for data manipulation and training with machine learning models.