赞
踩
Gao, H., & Ji, S. (2019). Graph u-nets. arXiv preprint arXiv:1905.05178.
最近导师说汇报英文论文的时候,PPT也相应做成英文的;给大家展示自己阶段性学习成果的时候,要尽量在自己理解的基础上讲的简单易懂。于是,尝试一下用英文表达这段时间对graph U-Nets的理解。
We know that U-Nets can be applied on image pixel-wise prediciton tasks. It mainly consists of “convolution” and “up-convolution”. So this paper(Graph U-Nets) want to develop similar network structure for graph data.
They propose graph pooling(gPool) and unpooling(gUnpool) operations, and based on these two operations, they build the encoder-decoder model on graph, known as the graph U-Nets.
In the second part of this article,I will summarize some background knowledge, including the structure of U-Nets, the layer-wise GCN, the k-max pooling in CNNs and the scalar projection. These can help us understand Graph U-Nets better. The third part is the main idea of the Graph U-Nets, I will use the figures in the original paper to show the graph pooling and unpooling operations, along with the basic structure.
As we all know, CNNs can classify the images according to the abstract features extracted by convolution operation. It usually includes multiple stacked convolution layers, the previous layer extract simple features, such as edges, While the latter layers extract more complex features. So the output of the network is a highly abstract feature of the whole image. But in some cases, we need to assign a label to each pixel in the input images rather the whole images.
To this end, The U-Nets develop ''up-convolution" operation based on “convolution”. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network, and the expansive path consists of the unsampling of the feature map. The unsample can be achieved by linear interpolation or deconvolution. Due to the special structure of the U-Nets, we can obtain a feature map with the same size(or slightly smaller)as the input image. The finally feature map encodes the features of each pixel, so U-Nets can be used to pixel-wise prediction tasks.
The structure of GCNs is similar to that of CNNs, including input layer, multiple graph convolution layers and the output layer.
Graph U-Nets imitate the
k
k
k-max pooling in CNNs when perform graph pooling operation. Max pooling retuerns the single maximum value, like this:
While
k
k
k-max pooling over a linear sequence of values returns the subsequence of
k
k
k maximum values in the sequence, such as
k
=
2
k=2
k=2:
Given a sequence
q
q
q, the k-max pooling operation not only pools the
k
k
k most active features in
q
q
q ,but also preserves the order of the features, it is insensitive to their specific positions.
Given two vectors:
v
\mathbf{v}
v and
u
\mathbf{u}
u
The scaler projection of vector
v
\mathbf{v}
v on vector
u
\mathbf{u}
u is :
v
⋅
u
∣
u
∣
\frac{\mathbf{v}\cdot \mathbf{u}}{\left | \mathbf{u} \ \right |}
∣u ∣v⋅u
When given an input graph, graph pooling layer adaptively selects a subset of nodes to form a new but smaller graph. To this end, this paper employs a trainable projection vector
p
\mathbf p
p, they first project all node features to 1D, and then perform
k
k
k-max pooling for node selection.
Given a node
i
i
i with its feature vector
x
i
\mathbf{x}_i
xi, the scalar projection of
x
i
\mathbf{x}_i
xion
p
\mathbf{p}
p is
y
i
=
x
i
⋅
p
∣
p
∣
y_i = \frac{\mathbf{x}_i \cdot \mathbf{p} }{|\mathbf{p}|}
yi=∣p∣xi⋅p. Because
y
i
y_i
yi measures how much information of node
i
i
i can be retained when projected onto the direction of
p
\mathbf{p}
p, so we select nodes with the largest scalar projection values on
p
\mathbf{p}
p, this can preserve as much information as possible from the original graph.
Input:
layer-wise propagation rule of the graph pooling layer
l
l
l:
1.Compute the scalar projection of
X
l
X^l
Xl on
p
l
\mathbf{p}^l
pl:
y
=
[
y
1
,
y
2
,
y
3
,
y
4
]
T
\mathbf{y} = [y_1, y_2, y_3, y_4]^T
y=[y1,y2,y3,y4]T;
2. Rank
y
\mathbf{y}
y and returns the
k
k
k-largest(
k
=
2
k=2
k=2) values in
y
\mathbf{y}
y. In the above figure, we selected the first and third elements in “top k” vector, so the return indices are
i
1
i_1
i1,
i
3
i_3
i3;
3. Extract the adjacency matrix :
A
l
[
i
1
]
[
i
1
]
A^l[i_1][i_1]
Al[i1][i1]、
A
l
[
i
1
]
[
i
3
]
A^l[i_1][i_3]
Al[i1][i3]、
A
l
[
i
3
]
[
i
1
]
A^l[i_3][i_1]
Al[i3][i1]、
A
l
[
i
3
]
[
i
3
]
A^l[i_3][i_3]
Al[i3][i3]–>form the adjacency matrix of layer
l
+
1
l+1
l+1;
4. Extract the feature matrix :
X
l
[
i
1
:
]
X^l[i_1:]
Xl[i1:]、
X
l
[
i
3
:
]
X^l[i_3:]
Xl[i3:]–> form the
X
~
l
\tilde{X}^l
X~l, i.e. the first and third row of
X
l
X^l
Xl.
5. Form the feature matrix of layer
l
+
1
l+1
l+1:
y
~
=
s
i
g
m
o
i
d
(
y
(
i
d
x
)
)
\tilde{y} = sigmoid(\mathbf{y}(idx))
y~=sigmoid(y(idx))
X
l
+
1
=
X
~
l
⊙
y
~
X^{l+1} = \tilde{X}^l \odot \tilde{y}
Xl+1=X~l⊙y~
Graph unpooling layer performs the inverse operation of the gPool layer and restores the graph into its original structure. To achieve this, we need to record the locations of nodes selected in the corresponding gPool layer, and use this information to place nodes back to their original positions in the graph.
In the example, for nodes that are not selected in the corresponding gPool layer, their feature vecotor in gUnpool layer is empty.
input graph:
7 nodes, each has 2 feature.
GCN layer:
transformed the input feature into low-dimensional representations.
two encoder blocks:
a gPool layer–> a GCN layer
gPool layers :reduce the size of graph to encode higher-order features.
GCN layers : aggregating information from each node’s first-order information.
two decoder blocks:
a gUnpool layer --> a GCN layer
gUnpool layers: restores higher resolution structure.
skip connection between corresponding blocks of encoder and decoder layers:
transmit spatial information to decoders for better performance.
either feature map addition or concatenation.
Since related edges are removed when removing nodes in gPool, the nodes in the pooled graph might become isolated. We should increase connectivity among nodes in the pooled graph–>graph power.
We can not directly apply k k k-max pooling in CNNs to graph, it may result in inconsistency in the connectivity of selected nodes. But why when the selection is based on 1D footprint of each node, the connectivity in the new graph is consistent ?
How the projection vector p \mathbf{p} p is trained ?
Why the gate operation makes the projection vector trainable by back-propagation ?
英语真的太差啦,以后要更加凝练更加简单,加油~hhh
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。