     它包含有多个卷积层,我们用conv4_2这一层输出的特征图计算内容损失,怎样计算呢?  它就是将内容图片塞进这个网络,接着将随机产生的噪音也塞进网络,然后计算它们到这一层的特征图,最后逐像素点计算差值。





  • 共包含64个特征图即feature map,或者说图像的深度、通道的个数
  • 每个特征图都是对上一层输出的一种理解,可以类比成64个人对同一幅画的不同理解
  • 这些人可能分别偏好印象派、现代主义、超现实主义、表现主义等不同风格
  • 当图像是某一种风格时,可能这一部分人很欣赏,但那一部分人不喜欢
  • 当图像是另一种风格时,可能这一部分人不喜欢,但那一部分人很欣赏
  • 64个人之间理解的差异,可以用特征图的互相关表示,这里使用Gram矩阵计算互相关
  • 不同的风格会导致差异化的互相关结果

  Gram矩阵的计算如下,如果有64个特征图,那么Gram矩阵的大小便是 64\times 64 ,第 i 行第 j 列的值表示第 i 个特征图和第 j 个特征图之间的互相关,用内积计算





                                                       L_{total}(\vec{p},\vec{a},\vec{x})=\alpha L_{content}(\vec{p},\vec{x})+\beta L_{style}(\vec{a},\vec{x})








  • 转换网络:参数需要训练,将内容图片转换成迁移图片
  • 损失网络:计算迁移图片和风格图片之间的风格损失,以及迁移图片和原始内容图片之间的内容损失




   白话讲:就是让损失网络不动,我们训练转换网络,让网络达到这样的效果:随便给一张图片,都能转移成ys所对应风格的图片。这里我们需要大量的图片x,  将不同的x都能训练成ys所具有的风格。 训练好后,将模型保存,我们下次使用的时候,就不需要再经过损失网络,我们只需将待转移风格的图片塞进转换网络,它将能生成所对应的风格图片。


  1. import tensorflow as tf
  2. import numpy as np
  3. import cv2
  4. from imageio import imread, imsave
  5. import scipy.io
  6. import os
  7. import glob
  8. from tqdm import tqdm
  9. import matplotlib.pyplot as plt
  10. import cv2
  11. # 进行图片的统一化处理
  12. style_images = glob.glob('styles/*.jpg')
  13. # 加载内容图片,去掉黑白图片,处理成指定大小,暂时不进行归一化,像素值范围为0至255之间
  14. def resize_and_crop(image, image_size):
  15. h = image.shape[0] # 获取高
  16. w = image.shape[1] # 获取宽
  17. # 也就是截取中间的正方形
  18. if h > w:
  19. # 高大于宽,则高要截取一部分
  20. image = image[h // 2 - w // 2: h // 2 + w // 2, :, :]
  21. else:
  22. image = image[:, w // 2 - h // 2: w // 2 + h // 2, :]
  23. image = cv2.resize(image, (image_size, image_size))
  24. return image
  25. X_data = []
  26. image_size = 256
  27. # 加载训练集
  28. paths = glob.glob('train2014/*.jpg')
  29. for i in tqdm(range(len(paths))):
  30. path = paths[i]
  31. image = imread(path)
  32. if len(image.shape) < 3: # 如果不是彩色,则不要
  33. continue
  34. X_data.append(resize_and_crop(image, image_size)) # 将符合条件的图片进行放缩
  35. X_data = np.array(X_data)
  36. print(X_data.shape) # 82216张
  37. # 加载matlab训练好的模型
  38. vgg = scipy.io.loadmat('imagenet-vgg-verydeep-19.mat')
  39. vgg_layers = vgg['layers']
  40. def vgg_endpoints(inputs, reuse=None):
  41. with tf.variable_scope('endpoints', reuse=reuse):
  42. def _weights(layer, expected_layer_name):
  43. W = vgg_layers[0][layer][0][0][2][0][0]
  44. b = vgg_layers[0][layer][0][0][2][0][1]
  45. layer_name = vgg_layers[0][layer][0][0][0][0]
  46. assert layer_name == expected_layer_name
  47. return W, b
  48. def _conv2d_relu(prev_layer, layer, layer_name):
  49. W, b = _weights(layer, layer_name)
  50. W = tf.constant(W)
  51. b = tf.constant(np.reshape(b, (b.size)))
  52. return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b)
  53. def _avgpool(prev_layer):
  54. return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
  55. graph = {}
  56. graph['conv1_1'] = _conv2d_relu(inputs, 0, 'conv1_1')
  57. graph['conv1_2'] = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
  58. graph['avgpool1'] = _avgpool(graph['conv1_2'])
  59. graph['conv2_1'] = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
  60. graph['conv2_2'] = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
  61. graph['avgpool2'] = _avgpool(graph['conv2_2'])
  62. graph['conv3_1'] = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
  63. graph['conv3_2'] = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
  64. graph['conv3_3'] = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
  65. graph['conv3_4'] = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
  66. graph['avgpool3'] = _avgpool(graph['conv3_4'])
  67. graph['conv4_1'] = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
  68. graph['conv4_2'] = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
  69. graph['conv4_3'] = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
  70. graph['conv4_4'] = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
  71. graph['avgpool4'] = _avgpool(graph['conv4_4'])
  72. graph['conv5_1'] = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
  73. graph['conv5_2'] = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
  74. graph['conv5_3'] = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
  75. graph['conv5_4'] = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
  76. graph['avgpool5'] = _avgpool(graph['conv5_4'])
  77. return graph
  78. # 选择一张风格图,减去通道颜色均值后,得到风格图片在vgg19各个层的输出值,
  79. # 计算四个风格层对应的Gram矩阵
  80. style_index = 1 # 读取第二张风格图 这样训练的就是生成第二张风格的网络
  81. X_style_data = resize_and_crop(imread(style_images[style_index]), image_size)
  82. X_style_data = np.expand_dims(X_style_data, 0) # 讲风格图扩展一维,因为网络的输入为四维
  83. print(X_style_data.shape)
  84. MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))
  85. X_style = tf.placeholder(dtype=tf.float32, shape=X_style_data.shape, name='X_style')
  86. style_endpoints = vgg_endpoints(X_style - MEAN_VALUES)
  87. # 选下面四层计算我们的风格
  88. STYLE_LAYERS = ['conv1_2', 'conv2_2', 'conv3_3', 'conv4_3']
  89. style_features = {}
  90. sess = tf.Session()
  91. for layer_name in STYLE_LAYERS:
  92. # 输入层的名字,得到各层的输出
  93. features = sess.run(style_endpoints[layer_name], feed_dict={X_style: X_style_data})
  94. features = np.reshape(features, (-1, features.shape[3]))
  95. # 计算Gram矩阵
  96. gram = np.matmul(features.T, features) / features.size
  97. style_features[layer_name] = gram
  98. # 转换网络的定义
  99. batch_size = 4
  100. X = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3], name='X')
  101. k_initializer = tf.truncated_normal_initializer(0, 0.1) # 初始化一张图,均值0,方差0.1
  102. def relu(x):
  103. return tf.nn.relu(x)
  104. def conv2d(inputs, filters, kernel_size, strides):
  105. p = int(kernel_size / 2)
  106. # reflect反射填充边缘信息。。即就是将边缘像素反射填充 上下分别反射填充p个像素,左右也分别反射填充p个像素
  107. # same是以补零的方式进行填充
  108. # valid是以丢弃的方式适应卷积核
  109. h0 = tf.pad(inputs, [[0, 0], [p, p], [p, p], [0, 0]], mode='reflect') # 等于就是把图形放大了二倍
  110. return tf.layers.conv2d(inputs=h0, filters=filters, kernel_size=kernel_size, strides=strides, padding='valid',
  111. kernel_initializer=k_initializer)
  112. def deconv2d(inputs, filters, kernel_size, strides):
  113. shape = tf.shape(inputs)
  114. height, width = shape[1], shape[2]
  115. # 这里将图像宽和高变为原来的2*strides倍
  116. h0 = tf.image.resize_images(inputs, [height * strides * 2, width * strides * 2],
  117. tf.image.ResizeMethod.NEAREST_NEIGHBOR)
  118. return conv2d(h0, filters, kernel_size, strides) # 再卷积的时候,还是变成了原来的2倍
  119. def instance_norm(inputs):
  120. # 将每张图片进行归一化
  121. return tf.contrib.layers.instance_norm(inputs)
  122. # 定义我们的残差块
  123. def residual(inputs, filters, kernel_size):
  124. h0 = relu(conv2d(inputs, filters, kernel_size, 1))
  125. h0 = conv2d(h0, filters, kernel_size, 1)
  126. return tf.add(inputs, h0)
  127. with tf.variable_scope('transformer', reuse=None):
  128. h0 = tf.pad(X - MEAN_VALUES, [[0, 0], [10, 10], [10, 10], [0, 0]], mode='reflect')
  129. h0 = relu(instance_norm(conv2d(h0, 32, 9, 1)))
  130. h0 = relu(instance_norm(conv2d(h0, 64, 3, 2)))
  131. h0 = relu(instance_norm(conv2d(h0, 128, 3, 2)))
  132. # 这里紧接着连五个残差块
  133. for i in range(5):
  134. h0 = residual(h0, 128, 3)
  135. h0 = relu(instance_norm(deconv2d(h0, 64, 3, 2)))
  136. h0 = relu(instance_norm(deconv2d(h0, 32, 3, 2)))
  137. h0 = tf.nn.tanh(instance_norm(conv2d(h0, 3, 9, 1)))
  138. h0 = (h0 + 1) / 2 * 255.
  139. shape = tf.shape(h0)
  140. g = tf.slice(h0, [0, 10, 10, 0], [-1, shape[1] - 20, shape[2] - 20, -1], name='g')
  141. # 将转换网络的输出即迁移图片,以及原始内容图片都输入到vgg19,
  142. # 得到各自对应层的输出,计算内容损失函数
  143. CONTENT_LAYER = 'conv3_3'
  144. content_endpoints = vgg_endpoints(X - MEAN_VALUES, True)
  145. g_endpoints = vgg_endpoints(g - MEAN_VALUES, True)
  146. def get_content_loss(endpoints_x, endpoints_y, layer_name):
  147. x = endpoints_x[layer_name]
  148. y = endpoints_y[layer_name]
  149. return 2 * tf.nn.l2_loss(x - y) / tf.to_float(tf.size(x))
  150. content_loss = get_content_loss(content_endpoints, g_endpoints, CONTENT_LAYER)
  151. # 根据迁移图片和风格图片在指定风格层的输出,计算风格损失函数
  152. style_loss = []
  153. for layer_name in STYLE_LAYERS:
  154. layer = g_endpoints[layer_name]
  155. shape = tf.shape(layer)
  156. bs, height, width, channel = shape[0], shape[1], shape[2], shape[3]
  157. features = tf.reshape(layer, (bs, height * width, channel))
  158. gram = tf.matmul(tf.transpose(features, (0, 2, 1)), features) / tf.to_float(height * width * channel)
  159. style_gram = style_features[layer_name]
  160. style_loss.append(2 * tf.nn.l2_loss(gram - style_gram) / tf.to_float(tf.size(layer)))
  161. style_loss = tf.reduce_sum(style_loss)
  162. # 计算全变差正则,得到总的损失函数
  163. def get_total_variation_loss(inputs):
  164. h = inputs[:, :-1, :, :] - inputs[:, 1:, :, :]
  165. w = inputs[:, :, :-1, :] - inputs[:, :, 1:, :]
  166. return tf.nn.l2_loss(h) / tf.to_float(tf.size(h)) + tf.nn.l2_loss(w) / tf.to_float(tf.size(w))
  167. total_variation_loss = get_total_variation_loss(g)
  168. content_weight = 1
  169. style_weight = 250
  170. total_variation_weight = 0.01
  171. loss = content_weight * content_loss + style_weight * style_loss + total_variation_weight * total_variation_loss
  172. # 定义优化器,通过调整转换网络中的参数降低总损失
  173. vars_t = [var for var in tf.trainable_variables() if var.name.startswith('transformer')]
  174. optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss, var_list=vars_t)
  175. # 训练模型,每轮训练结束后,用一张测试图片进行测试,并且将一些tensor的值写入
  176. # events文件,便于使用tensorboard查看
  177. style_name = style_images[style_index]
  178. style_name = style_name[style_name.find('/') + 1:].rstrip('.jpg')
  179. OUTPUT_DIR = 'samples_%s' % style_name
  180. if not os.path.exists(OUTPUT_DIR):
  181. os.mkdir(OUTPUT_DIR)
  182. tf.summary.scalar('losses/content_loss', content_loss)
  183. tf.summary.scalar('losses/style_loss', style_loss)
  184. tf.summary.scalar('losses/total_variation_loss', total_variation_loss)
  185. tf.summary.scalar('losses/loss', loss)
  186. tf.summary.scalar('weighted_losses/weighted_content_loss', content_weight * content_loss)
  187. tf.summary.scalar('weighted_losses/weighted_style_loss', style_weight * style_loss)
  188. tf.summary.scalar('weighted_losses/weighted_total_variation_loss', total_variation_weight * total_variation_loss)
  189. tf.summary.image('transformed', g)
  190. tf.summary.image('origin', X)
  191. summary = tf.summary.merge_all()
  192. writer = tf.summary.FileWriter(OUTPUT_DIR)
  193. sess.run(tf.global_variables_initializer())
  194. losses = []
  195. epochs = 2
  196. X_sample = imread('sjtu.jpg')
  197. h_sample = X_sample.shape[0]
  198. w_sample = X_sample.shape[1]
  199. for e in range(epochs):
  200. data_index = np.arange(X_data.shape[0])
  201. np.random.shuffle(data_index)
  202. X_data = X_data[data_index]
  203. for i in tqdm(range(X_data.shape[0] // batch_size)):
  204. X_batch = X_data[i * batch_size: i * batch_size + batch_size]
  205. ls_, _ = sess.run([loss, optimizer], feed_dict={X: X_batch})
  206. losses.append(ls_)
  207. if i > 0 and i % 20 == 0:
  208. writer.add_summary(sess.run(summary, feed_dict={X: X_batch}), e * X_data.shape[0] // batch_size + i)
  209. writer.flush()
  210. print('Epoch %d Loss %f' % (e, np.mean(losses)))
  211. losses = []
  212. gen_img = sess.run(g, feed_dict={X: [X_sample]})[0]
  213. gen_img = np.clip(gen_img, 0, 255)
  214. result = np.zeros((h_sample, w_sample * 2, 3))
  215. result[:, :w_sample, :] = X_sample / 255.
  216. result[:, w_sample:, :] = gen_img[:h_sample, :w_sample, :] / 255.
  217. plt.axis('off')
  218. plt.imshow(result)
  219. plt.show()
  220. imsave(os.path.join(OUTPUT_DIR, 'sample_%d.jpg' % e), result)
  221. # 保存模型
  222. saver = tf.train.Saver()
  223. saver.save(sess, os.path.join(OUTPUT_DIR, 'fast_style_transfer'))




