论文标题
基于Copula的机器学习模拟器的合成数据增强
Copula-based synthetic data augmentation for machine-learning emulators
论文作者
论文摘要
我们可以通过合成数据改善机器学习(ML)模拟器吗?如果数据稀缺或源昂贵,并且可以使用物理模型,则统计生成的数据可能可用于廉价地增强培训集。在这里,我们通过测试在下降的长波辐射和相应的神经网络模拟器上测试该方法,探索基于Copula的模型在天气和气候中生成合成增强数据集的使用。结果表明,对于副校长的数据集,对于平均绝对错误(从1.17到0.44 W m $^{ - 2} $),预测可提高高达62%。
Can we improve machine-learning (ML) emulators with synthetic data? If data are scarce or expensive to source and a physical model is available, statistically generated data may be useful for augmenting training sets cheaply. Here we explore the use of copula-based models for generating synthetically augmented datasets in weather and climate by testing the method on a toy physical model of downwelling longwave radiation and corresponding neural network emulator. Results show that for copula-augmented datasets, predictions are improved by up to 62 % for the mean absolute error (from 1.17 to 0.44 W m$^{-2}$).