论文标题
Alignahead:图神经网络上的在线跨层知识提取
Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks
论文作者
论文摘要
图形神经网络(GNN)上的现有知识蒸馏方法几乎是离线的,学生模型从强大的教师模型中提取知识以提高其表现。但是,由于培训成本,隐私等,并非总是可以访问预培训的教师模型。在本文中,我们提出了一个新颖的在线知识蒸馏框架来解决此问题。具体而言,每个学生GNN模型都从另一个经过交替的培训程序中的另一个经过培训的对应物中学习了所提取的本地结构。我们通过将一个学生层与另一个学生模型的不同深度保持一层将一个学生层保持一致,从理论上讲,结构信息分布在所有层上。在包括PPI,合着者-CS/Physics和Amazon-Computer/Photo在内的五个数据集上的实验结果表明,在我们的协作培训框架中,学生的表现会始终如一地提高,而没有预先训练的教师模型的监督。此外,我们还发现,我们的对齐技术可以加速模型的收敛速度,并且通常可以通过增加培训中的学生人数来提高其有效性。可用代码:https://github.com/guojy-eatstg/alignahead
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline, where the student model extracts knowledge from a powerful teacher model to improve its performance. However, a pre-trained teacher model is not always accessible due to training cost, privacy, etc. In this paper, we propose a novel online knowledge distillation framework to resolve this problem. Specifically, each student GNN model learns the extracted local structure from another simultaneously trained counterpart in an alternating training procedure. We further develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model, which theoretically makes the structure information spread over all layers. Experimental results on five datasets including PPI, Coauthor-CS/Physics and Amazon-Computer/Photo demonstrate that the student performance is consistently boosted in our collaborative training framework without the supervision of a pre-trained teacher model. In addition, we also find that our alignahead technique can accelerate the model convergence speed and its effectiveness can be generally improved by increasing the student numbers in training. Code is available: https://github.com/GuoJY-eatsTG/Alignahead