评估可解释的机器学习用于球形群集检测的可行性

论文标题

评估可解释的机器学习用于球形群集检测的可行性

Evaluating the feasibility of interpretable machine learning for globular cluster detection

论文作者

Dold, Dominik, Fahrion, Katja

论文摘要

外层状球状簇（GC）是星系形成和进化的重要示踪剂。从光度数据中获取GC目录涉及多个步骤，这些步骤可能会变得太耗时，无法在即将到来的宽阔成像项目（例如欧几里得）中预期的大数据量进行执行。在这项工作中，我们探讨了各种机器学习（ML）方法的可行性，以帮助搜索GC。我们在F475W和F850LP频段中使用档案中的Hubble空间望远镜数据，该数据在Fornax和Pirgo Galaxy簇中使用了141个早期型星系。使用现有的GC目录来标记数据，我们获得了包含18556 GC的84929源的广泛数据集，并在图像和表格数据上训练了几种ML方法，其中包含从图像中提取的物理相关特征。我们发现我们的评估ML模型能够生产与现有质量相似的目录。最佳性能方法，基于整体的模型，例如随机森林和卷积神经网络，恢复了约90-94％的GC，同时产生可接受数量的假检测（〜6-8％） - 某些虚假检测到的源被识别为在使用中未被标记为此类GC的源。在幅度范围22 <m4_g <24.5 mag中，恢复了98-99％的GC。我们甚至在处女座训练并评估Fornax数据（反之亦然）时发现了如此高的性能水平，这表明这些模型可以转移到不同条件不同的环境（例如不同距离的环境）。此外，我们证明了如何使用可解释的方法来更好地理解模型预测，恢复了大小，颜色和尺寸对于识别GC很重要。这些令人鼓舞的结果，表明可以使用类似的方法来为大量星系创建GC目录。

Extragalactic globular clusters (GCs) are important tracers of galaxy formation and evolution. Obtaining GC catalogues from photometric data involves several steps which will likely become too time-consuming to perform on the large data volumes that are expected from upcoming wide-field imaging projects such as Euclid. In this work, we explore the feasibility of various machine learning (ML) methods to aid the search for GCs. We use archival Hubble Space Telescope data in the F475W and F850LP bands of 141 early-type galaxies in the Fornax and Virgo galaxy clusters. Using existing GC catalogues to label the data, we obtain an extensive data set of 84929 sources containing 18556 GCs and we train several ML methods both on image and tabular data containing physically relevant features extracted from the images. We find that our evaluated ML models are capable of producing catalogues of similar quality as the existing ones. The best performing methods, ensemble-based models like random forests and convolutional neural networks, recover ~ 90-94 % of GCs while producing an acceptable amount of false detections (~ 6-8 %) - with some falsely detected sources being identifiable as GCs that have not been labelled as such in the used catalogues. In the magnitude range 22 < m4_g < 24.5 mag, 98 - 99 % of GCs are recovered. We even find such high performance levels when training on Virgo and evaluating on Fornax data (and vice versa), illustrating that the models are transferable to environments with different conditions such as different distances than in the used training data. Additionally, we demonstrate how interpretable methods can be used to better understand model predictions, recovering that magnitudes, colours, and sizes are important for identifying GCs. These are encouraging results, indicating that similar methods can be applied for creating GC catalogues for a large number of galaxies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题