论文标题
分子的条件神经过程
Conditional Neural Processes for Molecules
论文作者
论文摘要
神经过程(NP)是具有让人联想到高斯过程(GPS)的特性转移学习的模型。他们擅长对数据进行建模,这些数据几乎没有观察到同一输入空间上许多相关功能的观察结果,并通过最大程度地减少一个变异目标来训练,这在计算上比GPS要求的贝叶斯更新便宜得多。到目前为止,大多数NP的研究都集中在不代表现实转移学习任务的低维数据集上。药物发现是一个应用区域,其特征是数据集,该数据集由许多化学性质或功能稀疏的功能组成,但取决于分子输入的共享特征或表示。本文将条件神经过程(CNP)应用于Dockstring,这是用于基准测试ML模型的对接分数数据集。 CNP相对于在化学信息学中常见的监督学习基准中表现出竞争性表现,以及基于预训练和完善神经网络回归器的转移学习的替代模型。我们提出了一个贝叶斯优化实验,该实验展示了CNP的概率性质,并在不确定性定量中讨论了模型的缺点。
Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on low-dimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as an alternative model for transfer learning based on pre-training and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification.