当前位置:   article > 正文

[论文阅读] | Graph U-Nets

graph u-nets

Gao, H., & Ji, S. (2019). Graph u-nets. arXiv preprint arXiv:1905.05178.

最近导师说汇报英文论文的时候,PPT也相应做成英文的;给大家展示自己阶段性学习成果的时候,要尽量在自己理解的基础上讲的简单易懂。于是,尝试一下用英文表达这段时间对graph U-Nets的理解。



 

1 Introduction

We know that U-Nets can be applied on image pixel-wise prediciton tasks. It mainly consists of “convolution” and “up-convolution”. So this paper(Graph U-Nets) want to develop similar network structure for graph data.
 
 They propose graph pooling(gPool) and unpooling(gUnpool) operations, and based on these two operations, they build the encoder-decoder model on graph, known as the graph U-Nets.

  • graph pooling(gPool):adaptively selects some nodes to form a smaller graph based on their scalar projection values on trainable projection vector.
  • unpooling(gUnpool):an inverse operation of gPool, it restores the graph to its original structure with the help of locations of the nodes selected in the corresponding gPool layer.

In the second part of this article,I will summarize some background knowledge, including the structure of U-Nets, the layer-wise GCN, the k-max pooling in CNNs and the scalar projection. These can help us understand Graph U-Nets better. The third part is the main idea of the Graph U-Nets, I will use the figures in the original paper to show the graph pooling and unpooling operations, along with the basic structure.


 

2 Background knowledge
2.1 U-Nets

Alt

As we all know, CNNs can classify the images according to the abstract features extracted by convolution operation. It usually includes multiple stacked convolution layers, the previous layer extract simple features, such as edges, While the latter layers extract more complex features. So the output of the network is a highly abstract feature of the whole image. But in some cases, we need to assign a label to each pixel in the input images rather the whole images.
 
 To this end, The U-Nets develop ''up-convolution" operation based on “convolution”. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network, and the expansive path consists of the unsampling of the feature map. The unsample can be achieved by linear interpolation or deconvolution. Due to the special structure of the U-Nets, we can obtain a feature map with the same size(or slightly smaller)as the input image. The finally feature map encodes the features of each pixel, so U-Nets can be used to pixel-wise prediction tasks.

 

2.2 GCN

Alt
The structure of GCNs is similar to that of CNNs, including input layer, multiple graph convolution layers and the output layer.

  • Input layer: The adjacency matrix A A A and the feature matrix X X X of the input graph.
  • layer-wise graph convolution:
    X l + 1 = σ ( D ~ − 1 / 2 A ~ D ~ − 1 / 2 X l W l ) X_{l+1} = \sigma(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} X_{l} W_{l}) Xl+1=σ(D~1/2A~D~1/2XlWl)
      A ~ = A + I \tilde{A} = A + I A~=A+I : add self-loops in the input ajacenct matrix A A A.
      D ~ \tilde{D} D~ : the degree matrix corresponding to A ~ \tilde{A} A~.
      X l X_{l} Xl : the feature matrix of layer l l l.
      W l W_{l} Wl : a trainable weight matirx that applies a liner transformation on feature vectors.
  • output layer : the graph convolution can aggregate and transform node features, so the GCN can output the extracted features after multi-layer graph convolution.
     
2.3 k k k-max pooling in CNNs

Graph U-Nets imitate the k k k-max pooling in CNNs when perform graph pooling operation. Max pooling retuerns the single maximum value, like this:
Alt

While k k k-max pooling over a linear sequence of values returns the subsequence of k k k maximum values in the sequence, such as k = 2 k=2 k=2:
Alt
Given a sequence q q q, the k-max pooling operation not only pools the k k k most active features in q q q ,but also preserves the order of the features, it is insensitive to their specific positions.
 

2.4 scalar projection

Alt
Given two vectors: v \mathbf{v} v and u \mathbf{u} u
The scaler projection of vector v \mathbf{v} v on vector u \mathbf{u} u is : v ⋅ u ∣ u   ∣ \frac{\mathbf{v}\cdot \mathbf{u}}{\left | \mathbf{u} \ \right |} u vu


 

3 Main Idea
3.1 graph pooling layer

When given an input graph, graph pooling layer adaptively selects a subset of nodes to form a new but smaller graph. To this end, this paper employs a trainable projection vector p \mathbf p p, they first project all node features to 1D, and then perform k k k-max pooling for node selection.
 Given a node i i i with its feature vector x i \mathbf{x}_i xi, the scalar projection of x i \mathbf{x}_i xion p \mathbf{p} p is y i = x i ⋅ p ∣ p ∣ y_i = \frac{\mathbf{x}_i \cdot \mathbf{p} }{|\mathbf{p}|} yi=pxip. Because y i y_i yi measures how much information of node i i i can be retained when projected onto the direction of p \mathbf{p} p, so we select nodes with the largest scalar projection values on p \mathbf{p} p, this can preserve as much information as possible from the original graph.

Input

  • Graph G G G : 4 nodes, each contains 5 features.
  • the adjacency matrix : A ∈ R 4 × 4 A \in \mathbb{R}^{4\times4} AR4×4, each non-zero entry in A A A represents an edge between two nodes in the graph.
  • the feature matrix: X ∈ R 4 × 5 X \in\mathbb{R}^{4\times5} XR4×5, each row vector x i l \mathbf{x}_{i}^{l} xil in the feature matrix X l X^l Xl denotes the feature vector of node i i i in the graph.
    Alt

layer-wise propagation rule of the graph pooling layer l l l:
 1.Compute the scalar projection of X l X^l Xl on p l \mathbf{p}^l pl: y = [ y 1 , y 2 , y 3 , y 4 ] T \mathbf{y} = [y_1, y_2, y_3, y_4]^T y=[y1,y2,y3,y4]T;
 
 2. Rank y \mathbf{y} y and returns the k k k-largest( k = 2 k=2 k=2) values in y \mathbf{y} y. In the above figure, we selected the first and third elements in “top k” vector, so the return indices are i 1 i_1 i1, i 3 i_3 i3;
 
 3. Extract the adjacency matrix : A l [ i 1 ] [ i 1 ] A^l[i_1][i_1] Al[i1][i1] A l [ i 1 ] [ i 3 ] A^l[i_1][i_3] Al[i1][i3] A l [ i 3 ] [ i 1 ] A^l[i_3][i_1] Al[i3][i1] A l [ i 3 ] [ i 3 ] A^l[i_3][i_3] Al[i3][i3]–>form the adjacency matrix of layer l + 1 l+1 l+1;
 
 4. Extract the feature matrix : X l [ i 1 : ] X^l[i_1:] Xl[i1:] X l [ i 3 : ] X^l[i_3:] Xl[i3:]–> form the X ~ l \tilde{X}^l X~l, i.e. the first and third row of X l X^l Xl.
 
 5. Form the feature matrix of layer l + 1 l+1 l+1: y ~ = s i g m o i d ( y ( i d x ) ) \tilde{y} = sigmoid(\mathbf{y}(idx)) y~=sigmoid(y(idx)) X l + 1 = X ~ l ⊙ y ~ X^{l+1} = \tilde{X}^l \odot \tilde{y} Xl+1=X~ly~

 

3.2 graph unpooling layer

Graph unpooling layer performs the inverse operation of the gPool layer and restores the graph into its original structure. To achieve this, we need to record the locations of nodes selected in the corresponding gPool layer, and use this information to place nodes back to their original positions in the graph.
Alt
 In the example, for nodes that are not selected in the corresponding gPool layer, their feature vecotor in gUnpool layer is empty.

 

3.3 graph u-nets

Alt
input graph:
 7 nodes, each has 2 feature.
 
GCN layer:
 transformed the input feature into low-dimensional representations.
 
two encoder blocks:
 a gPool layer–> a GCN layer
 gPool layers :reduce the size of graph to encode higher-order features.
 GCN layers : aggregating information from each node’s first-order information.
 
two decoder blocks:
 a gUnpool layer --> a GCN layer
 gUnpool layers: restores higher resolution structure.

skip connection between corresponding blocks of encoder and decoder layers:
  transmit spatial information to decoders for better performance.
  either feature map addition or concatenation.

 

3.4 Connectivity Augmentation

Since related edges are removed when removing nodes in gPool, the nodes in the pooled graph might become isolated. We should increase connectivity among nodes in the pooled graph–>graph power.
在这里插入图片描述


 

4 Some Questions
  1. We can not directly apply k k k-max pooling in CNNs to graph, it may result in inconsistency in the connectivity of selected nodes. But why when the selection is based on 1D footprint of each node, the connectivity in the new graph is consistent ?

  2. How the projection vector p \mathbf{p} p is trained ?

  3. Why the gate operation makes the projection vector trainable by back-propagation ?

Alt
英语真的太差啦,以后要更加凝练更加简单,加油~hhh

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/85408
推荐阅读
  

闽ICP备14008679号