论文标题
使用IUPAC命名法和属性多模式复发神经网络从AFM图像识别的分子鉴定
Molecular Identification from AFM images using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks
论文作者
论文摘要
尽管是在原子量表上可视化分子的主要工具,但具有共同功能化金属尖端的AFM仍无法化学识别观察到的分子。在这里,我们提出了一种使用深度学习技术来解决这一挑战任务的策略。我们将分子识别定义为图像字幕问题,而不是通过传统分类方法识别有限数量的分子。我们设计了一个由两个多模式复发性神经网络组成的结构,能够使用3D-AFM图像堆栈作为输入来识别未知分子的结构和组成。根据IUPAC命名法规则,对神经网络进行了训练,可以提供每个分子的名称。为了训练和测试该算法,我们使用了新型的Quam-AFM数据集,该数据集包含近700,000个分子和1.65亿个AFM图像。预测的准确性是显着的,它通过累积的BLEU 4-gram(语言识别研究中的常见指标)量化了高分。
Despite being the main tool to visualize molecules at the atomic scale, AFM with CO-functionalized metal tips is unable to chemically identify the observed molecules. Here we present a strategy to address this challenging task using deep learning techniques. Instead of identifying a finite number of molecules following a traditional classification approach, we define the molecular identification as an image captioning problem. We design an architecture, composed of two multimodal recurrent neural networks, capable of identifying the structure and composition of an unknown molecule using a 3D-AFM image stack as input. The neural network is trained to provide the name of each molecule according to the IUPAC nomenclature rules. To train and test this algorithm we use the novel QUAM-AFM dataset, which contains almost 700,000 molecules and 165 million AFM images. The accuracy of the predictions is remarkable, achieving a high score quantified by the cumulative BLEU 4-gram, a common metric in language recognition studies.