论文标题
PULSAR分类的概率学习
Probabilistic learning for pulsar classification
论文作者
论文摘要
在这项工作中,我们探讨了使用概率学习来识别脉冲星候选者的可能性。我们利用Deep Gaussian流程(DGP)和深内核学习(DKL)。在平衡训练套装上接受培训,以避免阶级失衡的影响,模型的表现,使积极班级与负面级别($ roc $ -AUC-auc \ sim \ sim 0.98 $)的可能性相对较高,这是非常有希望的。我们估计每个模型预测的预测性熵,并发现DKL在预测中比DGP更自信,并提供了更好的不确定性校准。在调查数据集对模型的训练效果时,结果表明,每个模型性能都随训练集中的多数级数越来越多。有趣的是,由于许多负类$ 10 \ times $ $ $ $ $ $ $ $ $ $ $ $,这些模型仍然提供了相当良好的校准不确定性,即预期的不确定性校准误差(UCE)小于$ 6 \%$。我们还在这项研究中表明,在相对较少的培训数据集的情况下,如何通过贝叶斯主动学习(BALD)进行培训的基于卷积神经网络的分类器(BALD)。我们发现,通过优化数量的培训示例,该模型 - 对预测的最自信 - 概括了相对较好的概括,并产生了最佳的不确定性校准,该校准对应于UCE = $ 3.118 \%$。
In this work, we explore the possibility of using probabilistic learning to identify pulsar candidates. We make use of Deep Gaussian Process (DGP) and Deep Kernel Learning (DKL). Trained on a balanced training set in order to avoid the effect of class imbalance, the performance of the models, achieving relatively high probability of differentiating the positive class from the negative one ($roc$-$auc \sim 0.98$), is very promising overall. We estimate the predictive entropy of each model predictions and find that DKL is more confident than DGP in its predictions and provides better uncertainty calibration. Upon investigating the effect of training with imbalanced dataset on the models, results show that each model performance decreases with an increasing number of the majority class in the training set. Interestingly, with a number of negative class $10\times$ that of positive class, the models still provide reasonably well calibrated uncertainty, i.e. an expected Uncertainty Calibration Error (UCE) less than $6\%$. We also show in this study how, in the case of relatively small amount of training dataset, a convolutional neural network based classifier trained via Bayesian Active Learning by Disagreement (BALD) performs. We find that, with an optimized number of training examples, the model -- being the most confident in its predictions -- generalizes relatively well and produces the best uncertainty calibration which corresponds to UCE = $3.118\%$.