Mulde：低维知识图嵌入的多教老师知识蒸馏

论文标题

Mulde：低维知识图嵌入的多教老师知识蒸馏

MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings

论文作者

Wang, Kai, Liu, Yu, Ma, Qian, Sheng, Quan Z.

论文摘要

基于知识图嵌入（KGE）的链接预测旨在预测自动构建知识图（kgs）的新三元组。但是，最近的KGE模型通过过度增加嵌入维度来改善性能，这可能会导致巨大的培训成本并需要更多的存储空间。在本文中，我们提出了一种新型的知识蒸馏框架Mulde，而不是培训高维模型，其中包括多个低维度双曲线KGE模型，作为教师和两个学生组成部分，即Junior和Senior。在一种新型的迭代蒸馏策略下，初级成分（一种低维的KGE模型）根据其初步预测结果主动询问教师，而高级成分则将教师的知识整合到基于两种机制的培训：关系特定的缩放量表和对比。实验结果表明，Mulde可以有效地提高低维kge模型的性能和训练速度。与几个广泛使用的数据集上的最新高维方法相比，蒸馏的32维模型具有竞争力。

Link prediction based on knowledge graph embeddings (KGE) aims to predict new triples to automatically construct knowledge graphs (KGs). However, recent KGE models achieve performance improvements by excessively increasing the embedding dimensions, which may cause enormous training costs and require more storage space. In this paper, instead of training high-dimensional models, we propose MulDE, a novel knowledge distillation framework, which includes multiple low-dimensional hyperbolic KGE models as teachers and two student components, namely Junior and Senior. Under a novel iterative distillation strategy, the Junior component, a low-dimensional KGE model, asks teachers actively based on its preliminary prediction results, and the Senior component integrates teachers' knowledge adaptively to train the Junior component based on two mechanisms: relation-specific scaling and contrast attention. The experimental results show that MulDE can effectively improve the performance and training speed of low-dimensional KGE models. The distilled 32-dimensional model is competitive compared to the state-of-the-art high-dimensional methods on several widely-used datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题