论文标题
使用符号回归和机器学习的晶格导热率预测
Lattice Thermal Conductivity Prediction using Symbolic Regression and Machine Learning
论文作者
论文摘要
晶格热导率的预测模型在发现热电学,热屏障涂层和半导体的热管理中具有广泛的应用。众所周知,KL很难预测。尽管诸如Debye-Callaway模型和Slack模型之类的经典模型已被用来近似无机化合物KL,但它们的准确性远非令人满意。本文中,我们提出了一种基于遗传编程的符号回归方法,用于显式KL模型,并将其与多层感知器神经网络和使用混合交叉验证方法(包括K折CV和Holdout验证)进行比较。我们的符号回归方法发现了四个公式,这些方法胜过了我们数据集上评估的松弛公式。通过分析模型的性能和产生的公式,我们发现训练有素的公式成功地复制了控制材料晶格导热率的正确物理定律。我们还确定,外推预测仍然是符号回归和常规机器学习方法的关键问题,并发现样品的分布在训练具有高概括能力的预测模型中扮演了关键作用。
Prediction models of lattice thermal conductivity have wide applications in the discovery of thermoelectrics, thermal barrier coatings, and thermal management of semiconductors. kL is notoriously difficult to predict. While classic models such as the Debye-Callaway model and the Slack model have been used to approximate the kL of inorganic compounds, their accuracy is far from being satisfactory. Herein, we propose a genetic programming based Symbolic Regression approach for explicit kL models and compare it with Multi-Layer Perceptron neural networks and a Random Forest Regressor using a hybrid cross-validation approach including both K-Fold CV and holdout validation. Four formulae have been discovered by our symbolic regression approach that outperform the Slack formula as evaluated on our dataset. Through the analysis of our models' performance and the formulae generated, we found that the trained formulae successfully reproduce the correct physical law that governs the lattice thermal conductivity of materials. We also identified that extrapolation prediction remains to be a key issue in both symbolic regression and regular machine learning methods and find the distribution of the samples place a key role in training a prediction model with high generalization capability.