论文标题
3DSC-一个新的超导体数据集,包括晶体结构
3DSC - A New Dataset of Superconductors Including Crystal Structures
论文作者
论文摘要
数据驱动的方法,特别是机器学习,可以通过在现有数据中找到隐藏的模式并使用它们来识别有前途的候选材料来帮助加快新材料的发现。就超导体而言,这是一种非常有趣但也是具有许多相关应用的复杂类别的材料,因此由于缺乏可访问的数据,数据科学工具的使用会减慢。在这项工作中,我们提出了一种新的公开可用的超导性数据集('3DSC'),其中包括临界温度$ t_ \ mathrm {c} $的超导材料,此外还包括经过测试过的非企业导体。与现有的数据库(例如包含化学成分的信息)等现有数据库相反,每种材料的大约三维晶体结构增强了3DSC。我们执行统计分析和机器学习实验,以表明访问此结构信息可以改善材料的临界温度$ T_ \ MATHRM {C} $的预测。此外,我们认为3DSC并不是完成的数据集,而是为进一步的研究提供了想法和方向,以多种方式改善3DSC。我们相信,该数据库将有助于应用最先进的机器学习方法以最终找到新的超导体。
Data-driven methods, in particular machine learning, can help to speed up the discovery of new materials by finding hidden patterns in existing data and using them to identify promising candidate materials. In the case of superconductors, which are a highly interesting but also a complex class of materials with many relevant applications, the use of data science tools is to date slowed down by a lack of accessible data. In this work, we present a new and publicly available superconductivity dataset ('3DSC'), featuring the critical temperature $T_\mathrm{c}$ of superconducting materials additionally to tested non-superconductors. In contrast to existing databases such as the SuperCon database which contains information on the chemical composition, the 3DSC is augmented by the approximate three-dimensional crystal structure of each material. We perform a statistical analysis and machine learning experiments to show that access to this structural information improves the prediction of the critical temperature $T_\mathrm{c}$ of materials. Furthermore, we see the 3DSC not as a finished dataset, but we provide ideas and directions for further research to improve the 3DSC in multiple ways. We are confident that this database will be useful in applying state-of-the-art machine learning methods to eventually find new superconductors.