论文标题
FedWalk:沟通有效的联合无监督的节点嵌入具有差异隐私的嵌入
FedWalk: Communication Efficient Federated Unsupervised Node Embedding with Differential Privacy
论文作者
论文摘要
节点嵌入旨在将复杂图中的节点映射到低维表示中。实际的大规模图和标记的困难激发了无监督节点嵌入问题的广泛研究。但是,以前的工作主要在给出完整图的集中设置中运行。随着对数据隐私的认识的越来越多,只知道一个顶点及其邻居的数据持有人需要更大的隐私保护。在本文中,我们介绍了FedWalk,这是一种基于随机步行的无监督节点嵌入算法,该算法在此类节点级别可见度图中运行,并在本地保留原始图。 FedWalk旨在提供具有数据隐私保护和出色的沟通效率的集中竞争图表能力。 FedWalk实例化了流行的联合范式,并包含三个模块。我们首先设计一个分层聚类树(HCT)构造函数,以提取每个节点的结构特征。动态的时间扭曲算法无缝处理不同节点的结构异质性。然后,根据构造的HCT,我们设计了一个随机步行生成器,其中序列编码器旨在保留隐私,并且设计了两个跳跃的邻居预测器来节省通信成本。然后,生成的随机步道用于基于Skipgram模型更新节点嵌入。在两个大图上进行了广泛的实验表明,作为一种集中式节点嵌入算法,FED-WALK仅具有高达1.8%的Micro-F1分数和4.4%Marco-F1得分损失,同时每次步行间隔约6.7倍。
Node embedding aims to map nodes in the complex graph into low-dimensional representations. The real-world large-scale graphs and difficulties of labeling motivate wide studies of unsupervised node embedding problems. Nevertheless, previous effort mostly operates in a centralized setting where a complete graph is given. With the growing awareness of data privacy, data holders who are only aware of one vertex and its neighbours demand greater privacy protection. In this paper, we introduce FedWalk, a random-walk-based unsupervised node embedding algorithm that operates in such a node-level visibility graph with raw graph information remaining locally. FedWalk is designed to offer centralized competitive graph representation capability with data privacy protection and great communication efficiency. FedWalk instantiates the prevalent federated paradigm and contains three modules. We first design a hierarchical clustering tree (HCT) constructor to extract the structural feature of each node. A dynamic time warping algorithm seamlessly handles the structural heterogeneity across different nodes. Based on the constructed HCT, we then design a random walk generator, wherein a sequence encoder is designed to preserve privacy and a two-hop neighbor predictor is designed to save communication cost. The generated random walks are then used to update node embedding based on a SkipGram model. Extensive experiments on two large graphs demonstrate that Fed-Walk achieves competitive representativeness as a centralized node embedding algorithm does with only up to 1.8% Micro-F1 score and 4.4% Marco-F1 score loss while reducing about 6.7 times of inter-device communication per walk.