Transition1x-用于构建可概括的反应机学习潜力的数据集

论文标题

Transition1x-用于构建可概括的反应机学习潜力的数据集

Transition1x -- a Dataset for Building Generalizable Reactive Machine Learning Potentials

论文作者

Schreiner, Mathias, Bhowmik, Arghya, Vegge, Tejs, Busk, Jonas, Winther, Ole

论文摘要

机器学习（ML）模型与它们在分子动力学研究中的有用性相反，作为反应屏障搜索的替代潜力，成功的成功有限。这是由于化学空间相关过渡状态区域中训练数据的稀缺性。当前，用于训练小分子系统上的ML模型的可用数据集几乎完全包含在平衡处或附近的配置。在这项工作中，我们介绍了包含960万密度功能理论（DFT）计算的数据集跃迁1X在WB97X/6-31G（d）理论水平上的反应途径上和周围分子构型的力计算。数据是通过在10K反应上以DFT运行裸露的弹性带（NEB）计算而生成的，同时保存中间计算。我们在Transition1x上训练最先进的等效图形消息通讯神经网络模型，并在流行的ANI1X和QM9数据集上进行交叉验证。我们表明，ML模型不能仅通过迄今流行的基准数据集进行过渡状态区域的特征。 Transition1x是一种新的具有挑战性的基准，它将为开发下一代ML力场提供一个重要的步骤，该电场也远离均衡配置和反应性系统。

Machine Learning (ML) models have, in contrast to their usefulness in molecular dynamics studies, had limited success as surrogate potentials for reaction barrier search. It is due to the scarcity of training data in relevant transition state regions of chemical space. Currently, available datasets for training ML models on small molecular systems almost exclusively contain configurations at or near equilibrium. In this work, we present the dataset Transition1x containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the wB97x/6-31G(d) level of theory. The data was generated by running Nudged Elastic Band (NEB) calculations with DFT on 10k reactions while saving intermediate calculations. We train state-of-the-art equivariant graph message-passing neural network models on Transition1x and cross-validate on the popular ANI1x and QM9 datasets. We show that ML models cannot learn features in transition-state regions solely by training on hitherto popular benchmark datasets. Transition1x is a new challenging benchmark that will provide an important step towards developing next-generation ML force fields that also work far away from equilibrium configurations and reactive systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题