优化基于图的半监督分类器中的扩散率和标签可靠性

论文标题

优化基于图的半监督分类器中的扩散率和标签可靠性

Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier

论文作者

Afonso, Bruno Klaus de Aquino, Berton, Lilian

论文摘要

半监督的学习吸引了研究人员的关注，因为它允许人们利用未标记的数据的结构，以获得竞争性分类结果，标签比监督方法少得多。本地和全球一致性（LGC）算法是最著名的基于图的半监督（GSSL）分类器之一。值得注意的是，其解决方案可以写成已知标签的线性组合。该线性组合的系数取决于参数$α$，在随机步行中到达标记的顶点时，奖励随着时间的流逝而衰减。在这项工作中，我们讨论了如何消除标记实例的自我影响可能是有益的，以及它与一对一错误的关系。此外，我们建议通过自动差异最大程度地减少此保留的损失。在此框架内，我们提出了估计可靠性和扩散率的方法。通过光谱表示，优化扩散率更有效地完成。结果表明，标签可靠性方法与强大的L1-Norm方法竞争，并且去除对角线条目可降低过度拟合的风险，并导致适当的参数选择标准。

Semi-supervised learning has received attention from researchers, as it allows one to exploit the structure of unlabeled data to achieve competitive classification results with much fewer labels than supervised approaches. The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers. Notably, its solution can be written as a linear combination of the known labels. The coefficients of this linear combination depend on a parameter $α$, determining the decay of the reward over time when reaching labeled vertices in a random walk. In this work, we discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error. Moreover, we propose to minimize this leave-one-out loss with automatic differentiation. Within this framework, we propose methods to estimate label reliability and diffusion rate. Optimizing the diffusion rate is more efficiently accomplished with a spectral representation. Results show that the label reliability approach competes with robust L1-norm methods and that removing diagonal entries reduces the risk of overfitting and leads to suitable criteria for parameter selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题