论文标题

通过机器学习的脉冲星检测很少

Pulsars Detection by Machine Learning with Very Few Features

论文作者

Lin, Haitao, Li, Xiangru, Luo, Ziying

论文摘要

基于机器学习(ML)方法来检测脉冲星的数据量在现代调查中呈指数增长,这是一个活跃的主题。为了提高检测性能,应专门研究进入ML模型的输入功能。在基于ML方法的现有PULSAR检测研究中,主要有两种特征设计:经验特征和统计特征。但是,由于多个特征的组合效应,在可用特征中存在一些冗余甚至无关的组件,这可以降低脉冲星检测模型的准确性。 Therefore, it is essential to select a subset of relevant features from a set of available candidate features and known as {\itshape feature selection.} In this work, two feature selection algorithms ----\textit{Grid Search} (GS) and \textit{Recursive Feature Elimination} (RFE)---- are proposed to improve the detection performance by removing the redundant and irrelevant features.在南部高级分辨率大学调查(HTRU-S)中评估了该算法,并使用五个PULSAR检测模型进行了评估。实验结果验证了我们提出的特征选择算法的有效性和效率。通过GS,只有两个功能的模型达到的召回率高达99 \%和假阳性率(FPR)低至0.65 \%;在RFE中,仅有三个功能的另一个模型可实现召回率99 \%,而在PULSAR候选者分类中,fpr为0.16 \%。此外,这项工作还研究了我们模型所需的功能数量以及错误分类的脉冲星。

It is an active topic to investigate the schemes based on machine learning (ML) methods for detecting pulsars as the data volume growing exponentially in modern surveys. To improve the detection performance, input features into an ML model should be investigated specifically. In the existing pulsar detection researches based on ML methods, there are mainly two kinds of feature designs: the empirical features and statistical features. Due to the combinational effects from multiple features, however, there exist some redundancies and even irrelevant components in the available features, which can reduce the accuracy of a pulsar detection model. Therefore, it is essential to select a subset of relevant features from a set of available candidate features and known as {\itshape feature selection.} In this work, two feature selection algorithms ----\textit{Grid Search} (GS) and \textit{Recursive Feature Elimination} (RFE)---- are proposed to improve the detection performance by removing the redundant and irrelevant features. The algorithms were evaluated on the Southern High Time Resolution University survey (HTRU-S) with five pulsar detection models. The experimental results verify the effectiveness and efficiency of our proposed feature selection algorithms. By the GS, a model with only two features reach a recall rate as high as 99\% and a false positive rate (FPR) as low as 0.65\%; By the RFE, another model with only three features achieves a recall rate 99\% and an FPR of 0.16\% in pulsar candidates classification. Furthermore, this work investigated the number of features required as well as the misclassified pulsars by our models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源