赞
踩
the proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers.
对于free-form image inpainting任务,vanilla conv是无效
~~~~~~
首先来看vanilla conv的公式:
O
y
,
x
=
∑
i
=
−
k
h
′
k
h
′
∑
i
=
−
k
w
′
k
w
′
W
k
h
′
+
i
,
k
w
′
+
j
⋅
I
y
+
i
,
x
+
j
O_{y,x}=\sum_{i=-k_{h}^{'}}^{k_{h}^{'}} \sum_{i=-k_{w}^{'}}^{k_{w}^{'}} W_{k_{h}^{'}+i, k_{w}^{'}+j}\cdot I_{y+i, x+j}
Oy,x=i=−kh′∑kh′i=−kw′∑kw′Wkh′+i,kw′+j⋅Iy+i,x+j
~~~~~~
对于输入图像I的每个通道的坐标点位置
I
x
,
y
I_{x, y}
Ix,y,都会有相同形态的滤波器(这里指kernel size,非每个滤波器本身的系数)对其进行vanilla conv操作。这对于classificaton和object detection任务是有意义的,因为每一个输入像素对于通过滑窗方式提取local feature都是有效的,即: vanilla conv对每个输入像素使用相同形态但系数不同的滤波器,可以有效地提取出图像的local feature。
~~~~~~
然而,对于image inpainting任务,input feature由valid pixels outside holes、invalid pixels in the masked regions(这些像素通常指在shallow layers才有,因为随着层数变深,invalid pixel会逐渐变成valid pixel)或synthesized pixels in the masked regions (deep layers)组成,这会使训练产生ambiguity,导致测试阶段产生visual artifacts,比如color discrepancy,blurriness,obvious edge responses。
partial conv的mask-update应该是以滑窗为单元,更新策略应该类似于腐蚀操作,随着层数增加,mask的黑色条纹应该会越来越细,图3的示意图可以印证这个观点。更多细节就需要阅读源码。
~~~~~~
为了让卷积只依赖于valid pixel,partial conv提出了masking和re-normalization策略,partial conv的公式如下:
O
y
,
x
=
{
∑
∑
W
⋅
(
I
⊙
M
s
u
m
(
M
)
)
)
i
f
s
u
m
(
M
)
>
0
0
o
t
h
e
r
w
i
s
e
O_{y,x}=\left\{
~~~~~~
其中M是对应的binary mask,每次经过partial conv,mask-update遵循以下策略:
m
y
,
x
′
=
{
1
i
f
s
u
m
(
M
)
>
0
0
o
t
h
e
r
w
i
s
e
m_{y,x}^{'}= \left\{
~~~~~~
partial conv的确提升了inpainting任务在irregular masks上的表现,但仍有以下几点缺陷:
partial conv
可以被认为是hard-gating single-channel un-learnable layer
再跟input feature map逐像素点乘。
~~~~~~
gated conv抛弃通过固定规则进行更新的hard mask,而是从数据中自动学习soft mask,公式如下:
G
a
t
i
n
g
y
,
x
=
∑
∑
W
g
⋅
I
Gating_{y,x} = \sum \sum W_{g} \cdot I
Gatingy,x=∑∑Wg⋅I
F
e
a
t
u
r
e
y
,
x
=
∑
∑
W
f
⋅
I
Feature_{y,x} = \sum \sum W_{f} \cdot I
Featurey,x=∑∑Wf⋅I
O
y
,
x
=
ϕ
(
F
e
a
t
u
r
e
y
,
x
)
⊙
σ
(
G
a
t
i
n
g
y
,
x
)
O_{y,x} = \phi \left ( Feature_{y,x} \right )\odot \sigma \left ( Gating_{y,x} \right )
Oy,x=ϕ(Featurey,x)⊙σ(Gatingy,x)
~~~~~~
其中,
σ
\sigma
σ表示对0~1的output gating values使用sigmoid激活函数,
ϕ
\phi
ϕ可以是任意激活函数(ReLU or LeakyReLU)
~~~~~~ gated conv使得网络可以针对每个channel和每个空间位置,学习一种动态特征选择机制。有趣的是,图3: row-3的intermediate gating value可视化表明,网络不仅可以根据background、mask和sketch,也可以根据一些通道的semantic segmentation来选择feature maps。甚至在更深的层,gated conv可以在不同的channel对masked regions进行highlight ,也可以sketch必要的information,来获得更好的inpainting结果。
客观指标:mean L1 loss, mean L2 loss, mean TV loss
主观效果
实现参考:SC-FEGAN
"""Local Response Normalization.
The 4-D `input` tensor is treated as a 3-D array of 1-D vectors (along the last
dimension), and each vector is normalized independently. Within a given vector,
each component is divided by the weighted, squared sum of inputs within
`depth_radius`. In detail,
sqr_sum[a, b, c, d] =
sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
output = input / (bias + alpha * sqr_sum) ** beta
a is batch size. d is channel.
~~~~~~
公式如下
y
(
X
)
=
(
X
∗
W
+
b
)
⊗
σ
(
X
∗
V
+
c
)
y({\bf{X}}) = ({\bf{X*W + b}}) \otimes \sigma ({\bf{X*V + c}})
y(X)=(X∗W+b)⊗σ(X∗V+c)
~~~~~~
其中 W,V为两个不同的卷积核
~~~~~~
tensorflow实现方法如下
def gate_conv(x_in, cnum, ksize, stride=1, rate=1, name='conv', padding='SAME', activation='leaky_relu', use_lrn=True, training=True): assert padding in ['SYMMETRIC', 'SAME', 'REFELECT'] if padding == 'SYMMETRIC' or padding == 'REFELECT': p = int(rate * (ksize - 1) / 2) x = tf.pad(x_in, [[0, 0], [p, p], [p, p], [0, 0]], mode=padding) padding = 'VALID' x = tf.layers.conv2d( x_in, cnum, ksize, stride, dilation_rate=rate, activation=None, padding=padding, name=name) if use_lrn: x = tf.nn.lrn(x, bias=0.00005) if activation == 'leaky_relu': x = tf.nn.leaky_relu(x) g = tf.layers.conv2d( x_in, cnum, ksize, stride, dilation_rate=rate, activation=tf.nn.sigmoid, padding=padding, name=name + '_g') x = tf.multiply(x, g) return x, g
def gate_deconv(input_, output_shape, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, name="deconv", training=True): with tf.variable_scope(name): # filter : [height, width, output_channels, in_channels] w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev)) deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape, strides=[1, d_h, d_w, 1]) biases = tf.get_variable('biases1', [output_shape[-1]], initializer=tf.constant_initializer(0.0)) deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape()) deconv = tf.nn.leaky_relu(deconv) g = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape, strides=[1, d_h, d_w, 1]) b = tf.get_variable('biases2', [output_shape[-1]], initializer=tf.constant_initializer(0.0)) g = tf.reshape(tf.nn.bias_add(g, b), deconv.get_shape()) g = tf.nn.sigmoid(deconv) deconv = tf.multiply(g,deconv) return deconv, g
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。