论文标题
深层分子梦:逆机器学习,用于De-Novo分子设计和可解释性,并具有汇总表示形式
Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations
论文作者
论文摘要
基于计算机的功能分子的De-Novo设计是当今化学信息学中最突出的挑战之一。结果,来自人工智能领域的生成和进化反向设计已经快速出现,旨在优化特定化学特性的分子。这些模型“间接”探索化学空间。通过学习潜在空间,策略,分布或通过对分子种群应用突变。然而,最近的自拍照弦乐表示分子的形式是微笑的溢出替代品,使其他潜在技术成为可能。因此,根据自拍照,我们提出了Pasithea,这是一种基于梯度的直接分子优化,它采用了计算机视觉的造型主义技术。帕塞蒂亚通过直接逆转神经网络的学习过程来利用梯度的使用,该过程经过训练以预测实现的化学特性。有效地,这形成了一个反回归模型,该模型能够生成针对某个特性优化的分子变体。尽管我们的结果是初步的,但我们观察到反向训练期间所选特性的分布变化,这清楚地表明了帕西斯的生存能力。构成主义的一个惊人特性是,我们可以直接探讨该模型对训练的化学空间的理解。我们预计将帕西斯扩展到较大的数据集,分子和更复杂的特性将导致新功能分子的设计以及机器学习模型的解释和解释。
Computer-based de-novo design of functional molecules is one of the most prominent challenges in cheminformatics today. As a result, generative and evolutionary inverse designs from the field of artificial intelligence have emerged at a rapid pace, with aims to optimize molecules for a particular chemical property. These models 'indirectly' explore the chemical space; by learning latent spaces, policies, distributions or by applying mutations on populations of molecules. However, the recent development of the SELFIES string representation of molecules, a surjective alternative to SMILES, have made possible other potential techniques. Based on SELFIES, we therefore propose PASITHEA, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision. PASITHEA exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties. Effectively, this forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability. A striking property of inceptionism is that we can directly probe the model's understanding of the chemical space it was trained on. We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new functional molecules as well as the interpretation and explanation of machine learning models.