论文标题
通过最小化超球能量的数据有效学习
Data-Efficient Learning via Minimizing Hyperspherical Energy
论文作者
论文摘要
如今,对大规模数据的深入学习是主导的。史无前例的数据规模可以说是深度学习成功的最重要的驱动力之一。但是,仍然存在收集数据或标签可能非常昂贵的场景,例如医学成像和机器人技术。为了填补这一空白,本文考虑了使用少量代表性数据从头开始研究的问题。首先,我们通过对球形歧管的同构管进行积极学习来表征这个问题。这自然会产生可行的假设类别。借助同源拓扑特性,我们确定了一个重要的联系 - 发现管歧管等同于最大程度地减少物理几何形状中的超球能(MHE)。受此连接的启发,我们提出了一种基于MHE的主动学习(MHEAL)算法,并为MHEAL提供了全面的理论保证,涵盖了收敛和概括分析。最后,我们证明了MHEAL在数据效率学习的广泛应用中的经验表现,包括深度聚类,分布匹配,版本空间采样和深度积极学习。
Deep learning on large-scale data is dominant nowadays. The unprecedented scale of data has been arguably one of the most important driving forces for the success of deep learning. However, there still exist scenarios where collecting data or labels could be extremely expensive, e.g., medical imaging and robotics. To fill up this gap, this paper considers the problem of data-efficient learning from scratch using a small amount of representative data. First, we characterize this problem by active learning on homeomorphic tubes of spherical manifolds. This naturally generates feasible hypothesis class. With homologous topological properties, we identify an important connection -- finding tube manifolds is equivalent to minimizing hyperspherical energy (MHE) in physical geometry. Inspired by this connection, we propose a MHE-based active learning (MHEAL) algorithm, and provide comprehensive theoretical guarantees for MHEAL, covering convergence and generalization analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide range of applications on data-efficient learning, including deep clustering, distribution matching, version space sampling and deep active learning.