论文标题

可变星的半监督分类和聚类分析

Semi-Supervised Classification and Clustering Analysis for Variable Stars

论文作者

Pantoja, R., Catelan, M., Pichara, K., Protopapas, P.

论文摘要

天文调查产生的大量时间序列数据呼吁使用机器学习算法来发现和分类数百万个天体来源。对于可变恒星,监督的学习方法已变得司空见惯。但是,这需要大量专家标记的光曲线以实现足够的性能,这是构建昂贵的。为了解决这个问题,我们介绍了两种方法。首先,一种半监督的分层方法,其训练数据比监督方法要少得多。其次,一个聚类分析过程,找到可能对应于可变星的类或子类的组。两种方法主要由降低数据的维度降低以进行可视化和避免维度的诅咒支持。我们使用从OGE,CSS和GAIA调查收集的目录测试了我们的方法。对于我们三个选择的可变星目目录中,半监督方法的性能仅使用培训中的数据的$ 5 \%$。此方法适合在只有少量训练数据时对可变星的主要类别进行分类。我们的聚类分析证实,相对于子类的大多数群集相对于班级的纯度超过90 \%,而80 \%的纯度则表明,这种类型的分析可以在大规模可变性调查中使用,作为确定可变星的类别或子级别的最初步骤,以确定在数据和/或构建其他可能的应用程序中,以及许多其他可能的应用程序。

The immense amount of time series data produced by astronomical surveys has called for the use of machine learning algorithms to discover and classify several million celestial sources. In the case of variable stars, supervised learning approaches have become commonplace. However, this needs a considerable collection of expert-labeled light curves to achieve adequate performance, which is costly to construct. To solve this problem, we introduce two approaches. First, a semi-supervised hierarchical method, which requires substantially less trained data than supervised methods. Second, a clustering analysis procedure that finds groups that may correspond to classes or sub-classes of variable stars. Both methods are primarily supported by dimensionality reduction of the data for visualization and to avoid the curse of dimensionality. We tested our methods with catalogs collected from OGLE, CSS, and Gaia surveys. The semi-supervised method reaches a performance of around 90\% for all of our three selected catalogs of variable stars using only $5\%$ of the data in the training. This method is suitable for classifying the main classes of variable stars when there is only a small amount of training data. Our clustering analysis confirms that most of the clusters found have a purity over 90\% with respect to classes and 80\% with respect to sub-classes, suggesting that this type of analysis can be used in large-scale variability surveys as an initial step to identify which classes or sub-classes of variable stars are present in the data and/or to build training sets, among many other possible applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源