有效的混合生物物理可变检索的主动学习方法

论文标题

有效的混合生物物理可变检索的主动学习方法

Active Learning Methods for Efficient Hybrid Biophysical Variable Retrieval

论文作者

Verrelst, ochem, Dethier, Sara, Rivera, Juan Pablo, Muñoz-Marí, Jordi, Camps-Valls, Gustau, Moreno, José

论文摘要

基于内核的机器学习回归算法（MLRAS）是实施到操作生物物理可变检索方案中的潜在强大方法。但是，他们在应对大型培训数据集方面面临困难。随着越来越多的光学遥感数据可用于分析，并且使用辐射传输模型（RTMS）的大量模拟数据训练内核MLRAS的可能性将需要实现有效的数据降低技术。主动学习（AL）方法可以选择数据集中最有用的样本。这封信介绍了六种AL方法，用于通过可管理的培训数据集实现优化的生物物理变量估计，并将其实现为基于MATLAB的MLRA工具箱，以供半自动使用。分析了基于Prosail模拟的叶片面积指数和叶绿素含量的估计准确性的效率分析了AL方法。每种实施的方法的表现都优于随机抽样，从而提高了采样率较低的检索精度。实际上，AL方法为使用RTM生成的培训数据提供了用于开发操作检索模型的培训数据的机会。

Kernel-based machine learning regression algorithms (MLRAs) are potentially powerful methods for being implemented into operational biophysical variable retrieval schemes. However, they face difficulties in coping with large training datasets. With the increasing amount of optical remote sensing data made available for analysis and the possibility of using a large amount of simulated data from radiative transfer models (RTMs) to train kernel MLRAs, efficient data reduction techniques will need to be implemented. Active learning (AL) methods enable to select the most informative samples in a dataset. This letter introduces six AL methods for achieving optimized biophysical variable estimation with a manageable training dataset, and their implementation into a Matlab-based MLRA toolbox for semi-automatic use. The AL methods were analyzed on their efficiency of improving the estimation accuracy of leaf area index and chlorophyll content based on PROSAIL simulations. Each of the implemented methods outperformed random sampling, improving retrieval accuracy with lower sampling rates. Practically, AL methods open opportunities to feed advanced MLRAs with RTM-generated training data for development of operational retrieval models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题