论文标题
Castelo:聚类原子亚型辅助铅优化 - 合并的机器学习和分子建模方法
CASTELO: Clustered Atom Subtypes aidEd Lead Optimization -- a combined machine learning and molecular modeling method
论文作者
论文摘要
药物发现是一个多阶段过程,包括两个昂贵的重大步骤:临床前研究和临床试验。在其阶段中,铅优化很容易消耗临床前预算的一半以上。我们提出了一种组合的机器学习和分子建模方法,该方法可以使铅优化工作流程\ textit {in Silico}自动化。最初的数据收集是通过基于物理的分子动力学(MD)模拟实现的。接触矩阵的计算为从模拟中提取的初步特征。为了利用模拟中的时间信息,我们以时间动态表示增强了接触矩阵数据,然后以无监督的卷积变异自动编码器(CVAE)进行建模。最后,将常规聚类方法和基于CVAE的聚类方法与指标进行比较,以对分子结构进行排名,并提出潜在的候选铅优化。在不需要广泛的结构 - 活性关系数据库的情况下,我们的方法为药物修饰热点提供了新的提示,这些提示可用于提高药物疗效。与传统的劳动密集型过程相比,我们的工作流程可能会从几个月/几天减少铅优化的周转时间,因此有可能成为医学研究人员的宝贵工具。
Drug discovery is a multi-stage process that comprises two costly major steps: pre-clinical research and clinical trials. Among its stages, lead optimization easily consumes more than half of the pre-clinical budget. We propose a combined machine learning and molecular modeling approach that automates lead optimization workflow \textit{in silico}. The initial data collection is achieved with physics-based molecular dynamics (MD) simulation. Contact matrices are calculated as the preliminary features extracted from the simulations. To take advantage of the temporal information from the simulations, we enhanced contact matrices data with temporal dynamism representation, which are then modeled with unsupervised convolutional variational autoencoder (CVAE). Finally, conventional clustering method and CVAE-based clustering method are compared with metrics to rank the submolecular structures and propose potential candidates for lead optimization. With no need for extensive structure-activity relationship database, our method provides new hints for drug modification hotspots which can be used to improve drug efficacy. Our workflow can potentially reduce the lead optimization turnaround time from months/years to days compared with the conventional labor-intensive process and thus can potentially become a valuable tool for medical researchers.