赞
踩
【参考资料】
【1】https://github.com/walid0925/AI_Artistry
【2】A Neural Algorithm of Artistic Style
Note: 本文主要是对论文及参考文献【1】中代码的理解
该算法的本质是利用深度卷积网络对图像输入的抽象,主要是三部分:
如下图所示:
VGG网络是牛津大学计算机视觉组和Google Deepmind研发的一种深度卷积网络。其特点在于反复的利用3x3的小型卷积核以及2x2的池化层。VGG16即16层的VGG网络,我们可以在keras-applications/vgg16.py中找到其模型实现。分析如下:
# Block 1
x = layers.Conv2D(64, (3, 3),
activation='relu',
padding='same',
name='block1_conv1')(img_input)
x = layers.Conv2D(64, (3, 3),
activation='relu',
padding='same',
name='block1_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
x = layers.Flatten(name='flatten')(x)
x = layers.Dense(4096, activation='relu', name='fc1')(x)
x = layers.Dense(4096, activation='relu', name='fc2')(x)
x = layers.Dense(classes, activation='softmax', name = 'predictions')(x)
利用VGG16构建三个神经网络,分别对应内容图像输入、风格图像输入和白噪声图像
cModel = VGG16(include_top=False, weights='imagenet', input_tensor=cImArr)
sModel = VGG16(include_top=False, weights='imagenet', input_tensor=sImArr)
gModel = VGG16(include_top=False, weights='imagenet', input_tensor=gImPlaceholder)
内容特征获取层为’block4_conv2’
风格特征获取层为’block1_conv1 block2_conv1 block3_conv1 block4_conv1’
P = get_feature_reps(x=cImArr, layer_names=[cLayerName], model=cModel)[0]
As = get_feature_reps(x=sImArr, layer_names=sLayerNames, model=sModel)
其中get_feature_rep函数就是获取神经网络在某些层的输出,注意的是这里对于风格特征需要将若干层拼接起来,而对于内容特征只取了其中一个维度,应表示RGB其中一种颜色。
for ln in layer_names:
selectedLayer = model.get_layer(ln)
featRaw = selectedLayer.output #获取该层的输出
xopt, f_val, info= fmin_l_bfgs_b(calculate_loss, x_val, fprime=get_grad, maxiter=iterations, disp=True)
xOut = postprocess_array(xopt)
xIm = save_original_size(xOut)
核心的训练函数是这句,x_val即白噪声图像的输出。根据《A Neural Algorithm of Artistic Style》一文中的定义的损失函数和梯度计算方法,白噪声图像被不断优化,在一定迭代后,它的VGG16的对应层输出会不断接近风格图像和内容图像的对应层输出,因此形成了最终的效果。下面来看损失函数和梯度的计算方式:
备注:其他一些优化的paper基本思路都类似,只是在所选择卷积神经网路模型以及损失函数的定义上作了优化。
3.1 calculate_loss
def get_total_loss(gImPlaceholder, alpha=1.0, beta=10000.0): #这里关键的几个步骤: 1. gImPlaceholder 就是白噪声图像,作为gModel的输入;这个输入应该在每次迭代都会被更新; 2. get_content_loss 计算其与内容特征的差异 3. get_style_loss 计算其余与风格内容特征的差异 4. 将上述差异计算总的损失值 F = get_feature_reps(gImPlaceholder, layer_names=[cLayerName], model=gModel)[0] Gs = get_feature_reps(gImPlaceholder, layer_names=sLayerNames, model=gModel) contentLoss = get_content_loss(F, P) styleLoss = get_style_loss(ws, Gs, As) totalLoss = alpha*contentLoss + beta*styleLoss return totalLoss def calculate_loss(gImArr): if gImArr.shape != (1, targetWidth, targetWidth, 3): gImArr = gImArr.reshape((1, targetWidth, targetHeight, 3)) loss_fcn = K.function([gModel.input], [get_total_loss(gModel.input)]) return loss_fcn([gImArr])[0].astype('float64')
3.1 calculate_loss
F是噪声图像的内容特征输出;P是内容图像的特征输出
L
c
o
n
t
e
n
t
=
1
/
2
∑
i
j
(
F
i
j
−
P
i
j
)
2
L_{content}=1/2 \sum_{ij}(F_{ij} - P_{ij})^2
Lcontent=1/2∑ij(Fij−Pij)2
def get_content_loss(F, P):
cLoss = 0.5*K.sum(K.square(F - P))
return cLoss
3.2 get_style_loss
Gram矩阵为其向量话特征的内积:
G
i
j
=
∑
k
F
i
k
F
j
k
G_{ij}=\sum_kF_{ik}F_{jk}
Gij=∑kFikFjk
计算风格损失函数(每层):
E
l
=
1
4
N
l
2
M
l
2
∑
i
j
(
G
i
j
−
A
i
j
)
2
E_l=\dfrac{1}{4N_l^2M_l^2}\sum_{ij}(G_{ij}-A_{ij})^2
El=4Nl2Ml21∑ij(Gij−Aij)2
总的损失函数:
l
s
t
y
l
e
=
∑
l
w
l
E
l
l_{style}=\sum_lw_lE_l
lstyle=∑lwlEl其中
w
l
w_l
wl是每层的权重因子,本代码中为全1
def get_Gram_matrix(F):
G = K.dot(F, K.transpose(F))
return G
def get_style_loss(ws, Gs, As):
sLoss = K.variable(0.)
for w, G, A in zip(ws, Gs, As):
M_l = K.int_shape(G)[1]
N_l = K.int_shape(G)[0]
G_gram = get_Gram_matrix(G)
A_gram = get_Gram_matrix(A)
sLoss+= w*0.25*K.sum(K.square(G_gram - A_gram))/ (N_l**2 * M_l**2)
return sLoss
3.3 get_grad
def get_grad(gImArr):
if gImArr.shape != (1, targetWidth, targetHeight, 3):
gImArr = gImArr.reshape((1, targetWidth, targetHeight, 3))
grad_fcn = K.function([gModel.input], K.gradients(get_total_loss(gModel.input), [gModel.input]))
grad = grad_fcn([gImArr])[0].flatten().astype('float64')
return grad
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。