论文标题
强大的深度半监督学习:简要介绍
Robust Deep Semi-Supervised Learning: A Brief Introduction
论文作者
论文摘要
半监督学习(SSL)是机器学习的分支,旨在在标签不足时利用未标记的数据来提高学习绩效。最近,具有深层模型的SSL已被证明在标准基准任务上取得了成功。但是,它们仍然容易受到现实世界应用中各种鲁棒性威胁的影响,因为这些基准提供了完美的未标记数据,而在现实的情况下,未标记的数据可能会损坏。许多研究人员指出,在利用损坏的未标记数据后,SSL遭受了严重的性能降解问题。因此,迫切需要开发SSL算法,这些算法可能与未标记的数据损坏的数据合理。为了完全了解强大的SSL,我们进行了一项调查研究。我们首先从机器学习的角度阐明了对鲁棒SSL的正式定义。然后,我们将鲁棒性威胁分为三类:i)分布损坏,即未标记的数据分布与标记的数据不匹配; ii)特征腐败,即,未标记的示例的特征受到对抗的攻击; iii)标签损坏,即,未标记数据的标签分布不平衡。根据这种统一的分类法,我们对关注这些问题的最新作品进行了详尽的审查和讨论。最后,我们提出了强大的SSL中可能有希望的指示,以提供未来研究的见解。
Semi-supervised learning (SSL) is the branch of machine learning that aims to improve learning performance by leveraging unlabeled data when labels are insufficient. Recently, SSL with deep models has proven to be successful on standard benchmark tasks. However, they are still vulnerable to various robustness threats in real-world applications as these benchmarks provide perfect unlabeled data, while in realistic scenarios, unlabeled data could be corrupted. Many researchers have pointed out that after exploiting corrupted unlabeled data, SSL suffers severe performance degradation problems. Thus, there is an urgent need to develop SSL algorithms that could work robustly with corrupted unlabeled data. To fully understand robust SSL, we conduct a survey study. We first clarify a formal definition of robust SSL from the perspective of machine learning. Then, we classify the robustness threats into three categories: i) distribution corruption, i.e., unlabeled data distribution is mismatched with labeled data; ii) feature corruption, i.e., the features of unlabeled examples are adversarially attacked; and iii) label corruption, i.e., the label distribution of unlabeled data is imbalanced. Under this unified taxonomy, we provide a thorough review and discussion of recent works that focus on these issues. Finally, we propose possible promising directions within robust SSL to provide insights for future research.