有效，有效采样图神经网络的分层估计

论文标题

有效，有效采样图神经网络的分层估计

Hierarchical Estimation for Effective and Efficient Sampling Graph Neural Network

论文作者

Li, Yang, Xu, Bingbing, Cao, Qi, Yuan, Yige, Shen, Huawei

论文摘要

提高GNN的可伸缩性对于大图至关重要。现有方法利用了三个采样范式，包括节点，层和子图抽样，然后设计无偏估计器的可伸缩性。但是，高差异仍然严重阻碍了GNNS的性能。由于先前的研究要么缺乏方差分析，要么仅专注于特定采样范式，我们首先提出了一个统一的节点采样方差分析框架，并分析了核心挑战“循环依赖性”，以推导最小方差采样器，i。例如，采样概率取决于节点嵌入，而在取样完成之前，无法计算节点嵌入。现有的研究要么忽略节点嵌入或引入外部参数，从而导致缺乏既有效率和有效的降低方法。因此，我们提出了\ textbf {h} ierarchical \ textbf {e}刺激刺激，\ textbf {s}放大gnn（he-sgnn），首先估计了在抽样概率中的节点嵌入的概率，以打破循环依赖性，并在第二级级别上使用sampling gnn“ sampling nodaptiation the Estations the nodess” nodes''spraptiations nodeS'seplations'代表。考虑到技术差异，我们提出了不同的第一级估计量，即，用于层次采样的时间序列模拟和基于特征的基于特征的仿真。七个代表性数据集的实验结果证明了我们方法的有效性和效率。

Improving the scalability of GNNs is critical for large graphs. Existing methods leverage three sampling paradigms including node-wise, layer-wise and subgraph sampling, then design unbiased estimator for scalability. However, the high variance still severely hinders GNNs' performance. On account that previous studies either lacks variance analysis or only focus on a particular sampling paradigm, we firstly propose an unified node sampling variance analysis framework and analyze the core challenge "circular dependency" for deriving the minimum variance sampler, i. e., sampling probability depends on node embeddings while node embeddings can not be calculated until sampling is finished. Existing studies either ignore the node embeddings or introduce external parameters, resulting in the lack of a both efficient and effective variance reduction methods. Therefore, we propose the \textbf{H}ierarchical \textbf{E}stimation based \textbf{S}ampling GNN (HE-SGNN) with first level estimating the node embeddings in sampling probability to break circular dependency, and second level employing sampling GNN operator to estimate the nodes' representations on the entire graph. Considering the technical difference, we propose different first level estimator, i.e., a time series simulation for layer-wise sampling and a feature based simulation for subgraph sampling. The experimental results on seven representative datasets demonstrate the effectiveness and efficiency of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题