加权$ k $ - 最近的邻居

论文标题

加权$ k $ - 最近的邻居

Distributionally Robust Weighted $k$-Nearest Neighbors

论文作者

Zhu, Shixiang, Xie, Liyan, Zhang, Minghe, Gao, Rui, Xie, Yao

论文摘要

从几个样本中学习强大的分类器仍然是机器学习的关键挑战。研究的主要作用集中在开发基于$ k $ neart的邻居（$ k $ -nn）算法上，结合了捕获样品之间相似之处的公制学习。当样品受到限制时，鲁棒性对于确保分类器的概括能力特别重要。在本文中，我们研究了最小的$ k $ neart邻居的Minimax分配功能强大的配方，该公式旨在找到最佳的加权$ K $ -NN分类器，以对冲特征不确定性。我们开发了一种算法，\ texttt {dr.k-nn}，该算法可以有效地解决此功能优化问题和在执行分类时为训练样品分配最小值的最佳权重时的功能。这些权重依赖于阶级，并取决于最不利的情况下样本特征的相似性。当正确调整不确定性集的大小时，稳健的分类器的Lipschitz标准比香草$ K $ -NN较小，从而提高了概括能力。我们还将框架与基于神经网络的功能嵌入。我们通过各种真实DATA实验在少数训练样本中与最先进的样本环境中的最先进相比，我们证明了算法的竞争性能。

Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors, which aims to find the optimal weighted $k$-NN classifiers that hedge against feature uncertainties. We develop an algorithm, \texttt{Dr.k-NN}, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. When the size of the uncertainty set is properly tuned, the robust classifier has a smaller Lipschitz norm than the vanilla $k$-NN, and thus improves the generalization capability. We also couple our framework with neural-network-based feature embedding. We demonstrate the competitive performance of our algorithm compared to the state-of-the-art in the few-training-sample setting with various real-data experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题