赞
踩
Stanford / Winter 2021
考虑SGD训练GNN(以节点分类为例)
计算损失时,随机选取一些节点进行反向传播
这些节点互相之间是孤立的无关联
但GNN节点之间是互相有关联的不独立的,这种损失计算方式不能有效地训练GNNs
考虑Full-batch训练
GraphSAGE Neighbor Sampling: Scaling up GNNs
Stochastic Training of GNNs
Key Insight: To compute embedding of a single node, all we need is the K-hop neighborhood
Given a set of M different nodes in a mini-batch, we can generate their embeddings using M computational graphs
这样仅需加载一些节点计算图内的节点即可,省GPU Memory
但需要得到整个K-hop邻域信息并且进行聚合等变换,计算消耗大
Neighbor Sampling
For k = 1 , 2 , . . . , k k=1,2,...,k k=1,2,...,k
For each node in K-hop neighborhood
(Randomly) sample at most H k H_k Hk neighbors
How to sample the nodes
Random Sampling
Random Walk with Restarts
Natural graphs are “scale free”, sampling random neighbors, samples many low degree “leaf” nodes
Strategy to sample important nodes
Compute Random Walk with Restarts score R i R_i Ri starting at the green node
At each level sample H H H neighbors i i i with the highest R i R_i Ri
This strategy works much better in practice
Subgraph Sampling
Issues with Neighbor Sampling
The size of computational graph becomes exponentially large w.r.t. the #GNN layers
Computation is redundant, especially when nodes in a mini-batch share many neighbors
Key Idea
Subgraphs should retain edge connectivity structure of the original graph as much as possible
This way, the GNN over the subgraph generates embeddings closer to the GNN over the original graph
Cluster-GCN
Key Insight
Steps
Pre-processing
Given a large graph, partition it into groups of nodes (i.e., node-induced subgraphs)
We can use any scalable community detection methods, e.g., Louvain, METIS
Notice: Between-group edges are not included in G 1 , . . . , G C G_1, ..., G_C G1,...,GC
Mini-batch training
Sample one node group at a time. Apply GNN’s message passing over the induced subgraph
Issues with Cluster-GCN
The induced subgraph removes between-group links
Graph community detection algorithm puts similar nodes together in the same group
Sampled nodes are not diverse enough to be represent the entire graph structure
As a result, the gradient averaged over the sampled nodes becomes unreliable
Fluctuates a lot from a node group to another
In other words, the gradient has high variance
Advanced Cluster-GCN
Key Idea
Aggregate multiple node groups per mini-batch
Partition the graph into relatively-small groups of nodes
For each mini-batch
Sample and aggregate multiple node groups
Construct the induced subgraph of the aggregated node group
The rest is the same as vanilla Cluster-GCN
Steps
Pre-processing
Mini-batch training
randomly sample a set of q q q node groups
Aggregate all nodes across the sampled node groups
Extract the induced subgraph (多个groups组成一个induce graph从而使多个聚簇有连边,梯度更稳定)
Simplifying GNNs
Simplify GCN by removing ReLU non-linearity
H ( k + 1 ) = A ~ H ( k ) W k T \boldsymbol{H}^{(k+1)}=\widetilde{\boldsymbol{A}} \boldsymbol{H}^{(k)} \boldsymbol{W}_{\boldsymbol{k}}^{\mathrm{T}} H(k+1)=A H(k)WkT
Removing ReLU significantly simplifies GCN!
H ( K ) = A ~ K X W T \boldsymbol{H}^{(K)}=\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X} \boldsymbol{W}^{\mathrm{T}} H(K)=A KXWT
Let X ~ = A ~ K X \widetilde{\boldsymbol{X}}=\widetilde{\boldsymbol{A}}^{K} \boldsymbol{X} X =A KX be pre-computed matrix
Simplified GCN’s final embedding is
H ( K ) = X ~ W T \boldsymbol{H}^{(K)}=\widetilde{\boldsymbol{X}} \boldsymbol{W}^{\mathrm{T}} H(K)=X WT
Back to the node embedding form
h v ( K ) = W X ~ v h_{v}^{(K)}=\boldsymbol{W} \widetilde{\boldsymbol{X}}_{v} hv(K)=WX v
Potential Issue of Simplified GCN
Performance of Simplified GCN
When does Simplified GCN Work?
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。