当前位置:   article > 正文

深度学习、tensorflow--神经风格迁移(neural style transfer)原理以及实现代码_content_image = imread(options.content)

content_image = imread(options.content)

最近在计算机视觉界流行一个算法,神经风格迁移(neural style transfer),出自论文《 A Neural Algorithm of Artistic Style》,今天来让我们看看这究竟是怎么一回事。

我们有一个内容图片(拿学校大门献丑了):

一个风格图片:


最终生成图片:


是不是还是相当有意思的,我先介绍原理,一会给代码。

首先,在介绍原理之前,确保你明白卷积神经网络(CNN)原理,CNN不是本篇的重点就不再做多余介绍。



如上图所示,将内容图片简写为C,风格图片简写为S,生成图片简写为G。

那么我们内容损失函数定义为J(C,G),风格损失函数定义为J(S,G),总的损失函数为J(G)=αJ(C,G)+βJ(S,G)

其中α、β为可调的超参数,分别掌控内容图片与风格图片的权重大小。

使用梯度下降来训练出生成图片:


请注意,此时梯度下降是要训练出生成图像,而不是weights和bias。


那么内容损失函数J(C,G)和风格损失函数J(S,G)具体是什么公式呢?我们先说J(C,G):


这里给的tips:1、用CNN的隐藏层来计算内容损失函数  2、用VGG等预训练好的模型来进行训练,否则你会发现,以你电脑的计算能力要训练一次得等个两三年。  3、a[l](C)的意思为对于内容图片C,a[l](C)为在CNN的第l层经过激活函数后的值。a[l](G)的意思为对于生成图片G,a[l](G)为在CNN的第l层经过激活函数后的值。  4、所以当a[l](C)与a[l](G)越相似的时候,这俩图片的内容越相似 5、那么如何定义这俩图片相似?给出内容损失函数 J(C,G)=||a[l](C)-a[l](G)||^2/2

C代表提供Content的图像, G表示生成的图像,Pl和Fl分别代表它们对于l层的响应,因此l层的Content Loss:

J(C,G)=i,j(Flij-Plij)^2/2


再来看风格损失函数J(S,G):

假设在CNN某个隐藏层,高为nh,宽为nw,通道数为nc。假设其有五个通道,如上图所示用不同颜色标识。我们现在就看红色的通道和黄色的通道。可视化它们,如左下角的图所示,红色的通道找到了竖条纹的特征,黄色的通道找到了橙色背景的特征。那么现在问题来了,这两个通道的相关性是多大呢?

先定义相关性:如果竖条纹出现时呈现橙色背景很多,那就说明相关性很大。如果竖条纹出现时基本不是橙色背景,说明相关性很小。

OK,我们来看下一张图:


定义i,j,k。 i为此隐藏层的高,j为宽,k为通道。G (kk') [l]的意思是在第l层k通道和k'通道的相关性,公式右上角的(s)代表风格图片,(G)代表生成图片。

那么计算k通道和k'通道相关性的公式如上图所示,遍历所有i和j(即每个通道中的所有点),将两个通道此点的值相乘,累积之和。

此矩阵在此称为风格矩阵,但因为在线性代数里叫做gram矩阵,因此在此被写为G。


我扩展一下线性代数吧,因为这里确实容易乱。gram矩阵:

多用于衡量向量相关度,在这里因为每一个通道都是一个矩阵,所以衡量两个不同通道相关度就用到了gram矩阵。

G=两个通道的内积。内积定义(下方公式i和j是从1累加到n,因为排版问题可能显示会有错误):

A B = A , B = T r ( AT B ) = i=1n j=1n aij bij = ( v e c A )T v e c B

为矩阵 的内积。其中, 为矩阵 的迹,简记为

好了如果有不懂的自行百度吧,实在不懂评论问我。线性代数内容不要求掌握。

顺便提一句,本文用的“通道”一词,在某些文献中可能会被称为"channel"或者"features map".


因此我们得到了风格损失函数,就是计算生成图片的gram矩阵(对于任意k,k'都要进行计算),同时计算风格图片的gram矩阵,然后相减:


前面加的归一化常数项可加可不加,因为都可以用β来调节。若是你把所有隐藏层都求一遍风格损失,然后全加起来效果会更好,因此λ用于定义每层权重。

总的损失函数为J(G)=αJ(C,G)+βJ(S,G) 

接下来给出代码:

首先声明版权,本代码改编自  github:https://github.com/anishathalye/neural-style,感谢作者开源。

但如果你不想看繁琐的英文,以及期望有更易读的中文注释与可直接运行调试的程序,新的代码已上传至github:https://github.com/wenqiwenqi1/neural-style

同时请下载VGG19预训练模型放于代码根目录中:http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat


环境介绍 
- Python3.5 
- TensorFlow 1.3 
- VGG19 

VGG19 网络结构

  每一层神经网络都会利用上一层的输出来进一步提取更加复杂的特征,直到复杂到能被用来识别物体为止,所以每一层都可以被看做很多个局部特征的提取器。VGG19 在物体识别方面的精度甩了之前的算法一大截,之后的物体识别系统也基本都改用深度学习了。VGG19结构如下:

这里写图片描述 
(图片来自https://zhuanlan.zhihu.com/p/26746283)

代码详解:

源码主要包含了四个文件:neural_style.py、stylize.py、vgg.py以及begin.py

neural_style.py:外部接口函数,定义了函数的主要参数以及部分参数的默认值,包含对图像的读取和存贮,对输入图像进行resize,权值分配等操作,并将参数以及resize的图片传入stylize.py中。需在命令行中使用。

stylize.py:核心代码,包含了训练、优化等过程。

vgg.py:定义了网络模型以及相关的运算。

begin.py:可直接在ide中使用,方便调试运行程序。

我们可以使用下面的代码vgg.py读取VGG-19神经网络,用于构造Neural Style模型。

  1. import tensorflow as tf
  2. import numpy as np
  3. import scipy.io
  4. VGG19_LAYERS = (
  5. 'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',
  6. 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',
  7. 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3',
  8. 'relu3_3', 'conv3_4', 'relu3_4', 'pool3',
  9. 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3',
  10. 'relu4_3', 'conv4_4', 'relu4_4', 'pool4',
  11. 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3',
  12. 'relu5_3', 'conv5_4', 'relu5_4'
  13. )
  14. ##我们需要的信息是每层神经网络的kernels和bias
  15. def load_net(data_path):
  16. data = scipy.io.loadmat(data_path)
  17. if not all(i in data for i in ('layers', 'classes', 'normalization')): #判断这几个变量名是否在字典里
  18. raise ValueError("You're using the wrong VGG19 data. Please follow the instructions in the README to download the correct data.")
  19. mean = data['normalization'][0][0][0]
  20. mean_pixel = np.mean(mean, axis=(0, 1)) #先按0轴求均值,再按1轴求均值。0轴为行,1轴为列
  21. weights = data['layers'][0]
  22. return weights, mean_pixel
  23. def net_preloaded(weights, input_image, pooling):
  24. net = {}
  25. current = input_image
  26. for i, name in enumerate(VGG19_LAYERS):
  27. kind = name[:4]
  28. if kind == 'conv':
  29. kernels, bias = weights[i][0][0][0][0]
  30. # matconvnet: weights are [width, height, in_channels, out_channels]
  31. # tensorflow: weights are [height, width, in_channels, out_channels]
  32. kernels = np.transpose(kernels, (1, 0, 2, 3)) #因为tf和mat的weights位置不一样,所以要进行转置
  33. bias = bias.reshape(-1)
  34. current = _conv_layer(current, kernels, bias)
  35. elif kind == 'relu':
  36. current = tf.nn.relu(current)
  37. elif kind == 'pool':
  38. current = _pool_layer(current, pooling)
  39. net[name] = current
  40. assert len(net) == len(VGG19_LAYERS)
  41. return net
  42. def _conv_layer(input, weights, bias):
  43. conv = tf.nn.conv2d(input, tf.constant(weights), strides=(1, 1, 1, 1),
  44. padding='SAME')
  45. return tf.nn.bias_add(conv, bias)
  46. def _pool_layer(input, pooling):
  47. if pooling == 'avg':
  48. return tf.nn.avg_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),
  49. padding='SAME')
  50. else:
  51. return tf.nn.max_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),
  52. padding='SAME')
  53. def preprocess(image, mean_pixel):
  54. return image - mean_pixel
  55. def unprocess(image, mean_pixel):
  56. return image + mean_pixel
neural_style.py中定义了许多命令行的参数,以及对参数的预处理:
  1. # Copyright (c) 2015-2017 Anish Athalye. Released under GPLv3.
  2. import os
  3. import numpy as np
  4. import scipy.misc
  5. from stylize import stylize
  6. import math
  7. from argparse import ArgumentParser
  8. from PIL import Image
  9. # default arguments
  10. CONTENT_WEIGHT = 5e0
  11. CONTENT_WEIGHT_BLEND = 1
  12. STYLE_WEIGHT = 5e2
  13. TV_WEIGHT = 1e2
  14. STYLE_LAYER_WEIGHT_EXP = 1
  15. LEARNING_RATE = 1e1
  16. BETA1 = 0.9
  17. BETA2 = 0.999
  18. EPSILON = 1e-08
  19. STYLE_SCALE = 1.0
  20. ITERATIONS = 1000
  21. VGG_PATH = 'imagenet-vgg-verydeep-19.mat'
  22. POOLING = 'max'
  23. def build_parser():
  24. parser = ArgumentParser()
  25. #ArgumentParser为python接收命令行信息工具
  26. #required - 该命令行选项是否可以省略(只针对可选参数)。
  27. #help - 参数的简短描述。
  28. #metavar - 参数在帮助信息中的名字。
  29. #dest - 给parse_args()返回的对象要添加的属性名称。
  30. parser.add_argument('--content',
  31. dest='content', help='content image',
  32. metavar='CONTENT', required=True)
  33. parser.add_argument('--styles',
  34. dest='styles',
  35. nargs='+', help='one or more style images',
  36. metavar='STYLE', required=True)
  37. parser.add_argument('--output',
  38. dest='output', help='output path',
  39. metavar='OUTPUT', required=True)
  40. parser.add_argument('--iterations', type=int,
  41. dest='iterations', help='iterations (default %(default)s)',
  42. metavar='ITERATIONS', default=ITERATIONS)
  43. parser.add_argument('--print-iterations', type=int,
  44. dest='print_iterations', help='statistics printing frequency',
  45. metavar='PRINT_ITERATIONS')
  46. parser.add_argument('--checkpoint-output',
  47. dest='checkpoint_output', help='checkpoint output format, e.g. output%%s.jpg',
  48. metavar='OUTPUT')
  49. parser.add_argument('--checkpoint-iterations', type=int,
  50. dest='checkpoint_iterations', help='checkpoint frequency',
  51. metavar='CHECKPOINT_ITERATIONS')
  52. parser.add_argument('--width', type=int,
  53. dest='width', help='output width',
  54. metavar='WIDTH')
  55. parser.add_argument('--style-scales', type=float,
  56. dest='style_scales',
  57. nargs='+', help='one or more style scales',
  58. metavar='STYLE_SCALE')
  59. parser.add_argument('--network',
  60. dest='network', help='path to network parameters (default %(default)s)',
  61. metavar='VGG_PATH', default=VGG_PATH)
  62. parser.add_argument('--content-weight-blend', type=float,
  63. dest='content_weight_blend', help='content weight blend, conv4_2 * blend + conv5_2 * (1-blend) (default %(default)s)',
  64. metavar='CONTENT_WEIGHT_BLEND', default=CONTENT_WEIGHT_BLEND)
  65. parser.add_argument('--content-weight', type=float,
  66. dest='content_weight', help='content weight (default %(default)s)',
  67. metavar='CONTENT_WEIGHT', default=CONTENT_WEIGHT)
  68. parser.add_argument('--style-weight', type=float,
  69. dest='style_weight', help='style weight (default %(default)s)',
  70. metavar='STYLE_WEIGHT', default=STYLE_WEIGHT)
  71. parser.add_argument('--style-layer-weight-exp', type=float,
  72. dest='style_layer_weight_exp', help='style layer weight exponentional increase - weight(layer<n+1>) = weight_exp*weight(layer<n>) (default %(default)s)',
  73. metavar='STYLE_LAYER_WEIGHT_EXP', default=STYLE_LAYER_WEIGHT_EXP)
  74. parser.add_argument('--style-blend-weights', type=float,
  75. dest='style_blend_weights', help='style blending weights',
  76. nargs='+', metavar='STYLE_BLEND_WEIGHT')
  77. parser.add_argument('--tv-weight', type=float,
  78. dest='tv_weight', help='total variation regularization weight (default %(default)s)',
  79. metavar='TV_WEIGHT', default=TV_WEIGHT)
  80. parser.add_argument('--learning-rate', type=float,
  81. dest='learning_rate', help='learning rate (default %(default)s)',
  82. metavar='LEARNING_RATE', default=LEARNING_RATE)
  83. parser.add_argument('--beta1', type=float,
  84. dest='beta1', help='Adam: beta1 parameter (default %(default)s)',
  85. metavar='BETA1', default=BETA1)
  86. parser.add_argument('--beta2', type=float,
  87. dest='beta2', help='Adam: beta2 parameter (default %(default)s)',
  88. metavar='BETA2', default=BETA2)
  89. parser.add_argument('--eps', type=float,
  90. dest='epsilon', help='Adam: epsilon parameter (default %(default)s)',
  91. metavar='EPSILON', default=EPSILON)
  92. parser.add_argument('--initial',
  93. dest='initial', help='initial image',
  94. metavar='INITIAL')
  95. parser.add_argument('--initial-noiseblend', type=float,
  96. dest='initial_noiseblend', help='ratio of blending initial image with normalized noise (if no initial image specified, content image is used) (default %(default)s)',
  97. metavar='INITIAL_NOISEBLEND')
  98. parser.add_argument('--preserve-colors', action='store_true',
  99. dest='preserve_colors', help='style-only transfer (preserving colors) - if color transfer is not needed')
  100. parser.add_argument('--pooling',
  101. dest='pooling', help='pooling layer configuration: max or avg (default %(default)s)',
  102. metavar='POOLING', default=POOLING)
  103. return parser
  104. def main():
  105. parser = build_parser() #创建剖析器
  106. options = parser.parse_args() #拿到数据
  107. if not os.path.isfile(options.network): #若预训练模型不存在
  108. parser.error("Network %s does not exist. (Did you forget to download it?)" % options.network)
  109. content_image = imread(options.content) #读取content图片
  110. style_images = [imread(style) for style in options.styles] #读取style图片,可以有多个
  111. width = options.width
  112. if width is not None: #若有输入内容图片的宽度,则需重新调整内容图片shape
  113. new_shape = (int(math.floor(float(content_image.shape[0]) /
  114. content_image.shape[1] * width)), width)
  115. content_image = scipy.misc.imresize(content_image, new_shape)
  116. target_shape = content_image.shape #此为内容图片调整后的shape
  117. for i in range(len(style_images)): #调整style图片规模
  118. style_scale = STYLE_SCALE
  119. if options.style_scales is not None:
  120. style_scale = options.style_scales[i]
  121. style_images[i] = scipy.misc.imresize(style_images[i], style_scale *
  122. target_shape[1] / style_images[i].shape[1])
  123. style_blend_weights = options.style_blend_weights #获取风格图片集的权重,即分配是否更看重某张风格图
  124. if style_blend_weights is None:
  125. # default is equal weights
  126. style_blend_weights = [1.0/len(style_images) for _ in style_images] #若用户未输入,则平均分配
  127. else:
  128. total_blend_weight = sum(style_blend_weights)
  129. style_blend_weights = [weight/total_blend_weight
  130. for weight in style_blend_weights]
  131. initial = options.initial #获取初始图像
  132. if initial is not None:
  133. initial = scipy.misc.imresize(imread(initial), content_image.shape[:2])
  134. # Initial guess is specified, but not noiseblend - no noise should be blended
  135. if options.initial_noiseblend is None:
  136. options.initial_noiseblend = 0.0
  137. else:
  138. # Neither inital, nor noiseblend is provided, falling back to random generated initial guess
  139. if options.initial_noiseblend is None:
  140. options.initial_noiseblend = 1.0
  141. if options.initial_noiseblend < 1.0:
  142. initial = content_image
  143. if options.checkpoint_output and "%s" not in options.checkpoint_output: #若需保存中间图片且输入中没有%s
  144. parser.error("To save intermediate images, the checkpoint output "
  145. "parameter must contain `%s` (e.g. `foo%s.jpg`)")
  146. for iteration, image in stylize(
  147. network=options.network,
  148. initial=initial,
  149. initial_noiseblend=options.initial_noiseblend,
  150. content=content_image,
  151. styles=style_images,
  152. preserve_colors=options.preserve_colors,
  153. iterations=options.iterations,
  154. content_weight=options.content_weight,
  155. content_weight_blend=options.content_weight_blend,
  156. style_weight=options.style_weight,
  157. style_layer_weight_exp=options.style_layer_weight_exp,
  158. style_blend_weights=style_blend_weights,
  159. tv_weight=options.tv_weight,
  160. learning_rate=options.learning_rate,
  161. beta1=options.beta1,
  162. beta2=options.beta2,
  163. epsilon=options.epsilon,
  164. pooling=options.pooling,
  165. print_iterations=options.print_iterations,
  166. checkpoint_iterations=options.checkpoint_iterations
  167. ):
  168. output_file = None
  169. combined_rgb = image
  170. if iteration is not None:
  171. if options.checkpoint_output:
  172. output_file = options.checkpoint_output % iteration #在此迭代中保存一次图片
  173. else:
  174. output_file = options.output
  175. if output_file:
  176. imsave(output_file, combined_rgb)
  177. def imread(path): #读取图片
  178. img = scipy.misc.imread(path).astype(np.float)
  179. if len(img.shape) == 2:
  180. # grayscale
  181. img = np.dstack((img,img,img))
  182. elif img.shape[2] == 4:
  183. # PNG with alpha channel
  184. img = img[:,:,:3]
  185. return img
  186. def imsave(path, img): #保存图片
  187. img = np.clip(img, 0, 255).astype(np.uint8)
  188. Image.fromarray(img).save(path, quality=95)
  189. if __name__ == '__main__':
  190. main()
核心代码stylize.py
  1. import vgg
  2. import tensorflow as tf
  3. import numpy as np
  4. from sys import stderr
  5. from PIL import Image
  6. CONTENT_LAYERS = ('relu4_2', 'relu5_2')
  7. STYLE_LAYERS = ('relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1')
  8. try:
  9. reduce
  10. except NameError:
  11. from functools import reduce
  12. def stylize(network, initial, initial_noiseblend, content, styles, preserve_colors, iterations,
  13. content_weight, content_weight_blend, style_weight, style_layer_weight_exp, style_blend_weights, tv_weight,
  14. learning_rate, beta1, beta2, epsilon, pooling,
  15. print_iterations=None, checkpoint_iterations=None):
  16. """
  17. Stylize images.
  18. This function yields tuples (iteration, image); `iteration` is None
  19. if this is the final image (the last iteration). Other tuples are yielded
  20. every `checkpoint_iterations` iterations.
  21. :rtype: iterator[tuple[int|None,image]]
  22. """
  23. shape = (1,) + content.shape #若content.shape=(356, 600, 3) shape=(356, 600, 3, 1)
  24. style_shapes = [(1,) + style.shape for style in styles]
  25. content_features = {} #创建内容features map
  26. style_features = [{} for _ in styles] #创建风格features map
  27. vgg_weights, vgg_mean_pixel = vgg.load_net(network) #加载预训练模型,得到weights和mean_pixel
  28. layer_weight = 1.0
  29. style_layers_weights = {}
  30. for style_layer in STYLE_LAYERS:
  31. style_layers_weights[style_layer] = layer_weight
  32. layer_weight *= style_layer_weight_exp #若有设置style_layer_weight_exp,则style_layers_weights指数增长,
  33. # style_layer_weight_exp默认为1不增长
  34. # normalize style layer weights
  35. layer_weights_sum = 0
  36. for style_layer in STYLE_LAYERS:
  37. layer_weights_sum += style_layers_weights[style_layer]
  38. for style_layer in STYLE_LAYERS:
  39. style_layers_weights[style_layer] /= layer_weights_sum #更新style_layers_weights,使其总和为1
  40. # 首先创建一个image的占位符,然后通过eval()的feed_dict将content_pre传给image,
  41. # 启动net的运算过程,得到了content的feature maps
  42. # compute content features in feedforward mode
  43. g = tf.Graph()
  44. with g.as_default(), g.device('/cpu:0'), tf.Session() as sess: #计算content features
  45. image = tf.placeholder('float', shape=shape)
  46. net = vgg.net_preloaded(vgg_weights, image, pooling) #所有网络在此构建,net为content的features maps
  47. content_pre = np.array([vgg.preprocess(content, vgg_mean_pixel)]) #content - vgg_mean_pixel
  48. for layer in CONTENT_LAYERS:
  49. content_features[layer] = net[layer].eval(feed_dict={image: content_pre}) #content_features取值
  50. # print(layer,content_features[layer].shape)
  51. # compute style features in feedforward mode
  52. for i in range(len(styles)): #计算style features
  53. g = tf.Graph()
  54. with g.as_default(), g.device('/cpu:0'), tf.Session() as sess:
  55. image = tf.placeholder('float', shape=style_shapes[i])
  56. net = vgg.net_preloaded(vgg_weights, image, pooling) #pooling 默认为MAX
  57. style_pre = np.array([vgg.preprocess(styles[i], vgg_mean_pixel)]) #styles[i]-vgg_mean_pixel
  58. for layer in STYLE_LAYERS:
  59. features = net[layer].eval(feed_dict={image: style_pre})
  60. features = np.reshape(features, (-1, features.shape[3])) #根据通道数目reshape
  61. gram = np.matmul(features.T, features) / features.size #gram矩阵
  62. style_features[i][layer] = gram
  63. initial_content_noise_coeff = 1.0 - initial_noiseblend
  64. # make stylized image using backpropogation
  65. with tf.Graph().as_default():
  66. if initial is None:
  67. noise = np.random.normal(size=shape, scale=np.std(content) * 0.1)
  68. initial = tf.random_normal(shape) * 0.256 #初始化图片
  69. else:
  70. initial = np.array([vgg.preprocess(initial, vgg_mean_pixel)])
  71. initial = initial.astype('float32')
  72. noise = np.random.normal(size=shape, scale=np.std(content) * 0.1)
  73. initial = (initial) * initial_content_noise_coeff + (tf.random_normal(shape) * 0.256) * (1.0 - initial_content_noise_coeff)
  74. image = tf.Variable(initial)
  75. '''
  76. image = tf.Variable(initial)初始化了一个TensorFlow的变量,即为我们需要训练的对象。
  77. 注意这里我们训练的对象是一张图像,而不是weight和bias。
  78. '''
  79. net = vgg.net_preloaded(vgg_weights, image, pooling) #此处的net为生成图片的features map
  80. # content loss
  81. content_layers_weights = {}
  82. content_layers_weights['relu4_2'] = content_weight_blend #内容图片 content weight blend, conv4_2 * blend + conv5_2 * (1-blend)
  83. content_layers_weights['relu5_2'] = 1.0 - content_weight_blend #content weight blend默认为1,即只用conv4_2层
  84. content_loss = 0
  85. content_losses = []
  86. for content_layer in CONTENT_LAYERS:
  87. content_losses.append(content_layers_weights[content_layer] * content_weight * (2 * tf.nn.l2_loss(
  88. net[content_layer] - content_features[content_layer]) / #生成图片-内容图片
  89. content_features[content_layer].size)) # tf.nn.l2_loss:output = sum(t ** 2) / 2
  90. content_loss += reduce(tf.add, content_losses)
  91. # style loss
  92. style_loss = 0
  93. '''
  94. 由于style图像可以输入多幅,这里使用for循环。同样的,将style_pre传给image占位符,
  95. 启动net运算,得到了style的feature maps,由于style为不同filter响应的内积,
  96. 因此在这里增加了一步:gram = np.matmul(features.T, features) / features.size,即为style的feature。
  97. '''
  98. for i in range(len(styles)):
  99. style_losses = []
  100. for style_layer in STYLE_LAYERS:
  101. layer = net[style_layer]
  102. _, height, width, number = map(lambda i: i.value, layer.get_shape())
  103. size = height * width * number
  104. feats = tf.reshape(layer, (-1, number))
  105. gram = tf.matmul(tf.transpose(feats), feats) / size #求得生成图片的gram矩阵
  106. style_gram = style_features[i][style_layer]
  107. style_losses.append(style_layers_weights[style_layer] * 2 * tf.nn.l2_loss(gram - style_gram) / style_gram.size)
  108. style_loss += style_weight * style_blend_weights[i] * reduce(tf.add, style_losses)
  109. # total variation denoising
  110. tv_y_size = _tensor_size(image[:,1:,:,:])
  111. tv_x_size = _tensor_size(image[:,:,1:,:])
  112. tv_loss = tv_weight * 2 * (
  113. (tf.nn.l2_loss(image[:,1:,:,:] - image[:,:shape[1]-1,:,:]) /
  114. tv_y_size) +
  115. (tf.nn.l2_loss(image[:,:,1:,:] - image[:,:,:shape[2]-1,:]) /
  116. tv_x_size))
  117. # overall loss
  118. '''
  119. 接下来定义了Content Loss和Style Loss,结合文中的公式很容易看懂,在代码中,
  120. 还增加了total variation denoising,因此总的loss = content_loss + style_loss + tv_loss
  121. '''
  122. loss = content_loss + style_loss + tv_loss #总loss为三个loss之和
  123. # optimizer setup
  124. # optimizer setup
  125. # 创建train_step,使用Adam优化器,优化对象是上面的loss
  126. # 优化过程,通过迭代使用train_step来最小化loss,最终得到一个best,即为训练优化的结果
  127. train_step = tf.train.AdamOptimizer(learning_rate, beta1, beta2, epsilon).minimize(loss)
  128. def print_progress():
  129. stderr.write(' content loss: %g\n' % content_loss.eval())
  130. stderr.write(' style loss: %g\n' % style_loss.eval())
  131. stderr.write(' tv loss: %g\n' % tv_loss.eval())
  132. stderr.write(' total loss: %g\n' % loss.eval())
  133. # optimization
  134. best_loss = float('inf')
  135. best = None
  136. with tf.Session() as sess:
  137. sess.run(tf.global_variables_initializer())
  138. stderr.write('Optimization started...\n')
  139. if (print_iterations and print_iterations != 0):
  140. print_progress()
  141. for i in range(iterations):
  142. stderr.write('Iteration %4d/%4d\n' % (i + 1, iterations))
  143. train_step.run()
  144. last_step = (i == iterations - 1)
  145. if last_step or (print_iterations and i % print_iterations == 0):
  146. print_progress()
  147. if (checkpoint_iterations and i % checkpoint_iterations == 0) or last_step:
  148. this_loss = loss.eval()
  149. if this_loss < best_loss:
  150. best_loss = this_loss
  151. best = image.eval()
  152. img_out = vgg.unprocess(best.reshape(shape[1:]), vgg_mean_pixel) #还原图片
  153. if preserve_colors and preserve_colors == True:
  154. original_image = np.clip(content, 0, 255)
  155. styled_image = np.clip(img_out, 0, 255)
  156. # Luminosity transfer steps:
  157. # 1. Convert stylized RGB->grayscale accoriding to Rec.601 luma (0.299, 0.587, 0.114)
  158. # 2. Convert stylized grayscale into YUV (YCbCr)
  159. # 3. Convert original image into YUV (YCbCr)
  160. # 4. Recombine (stylizedYUV.Y, originalYUV.U, originalYUV.V)
  161. # 5. Convert recombined image from YUV back to RGB
  162. # 1
  163. styled_grayscale = rgb2gray(styled_image)
  164. styled_grayscale_rgb = gray2rgb(styled_grayscale)
  165. # 2
  166. styled_grayscale_yuv = np.array(Image.fromarray(styled_grayscale_rgb.astype(np.uint8)).convert('YCbCr'))
  167. # 3
  168. original_yuv = np.array(Image.fromarray(original_image.astype(np.uint8)).convert('YCbCr'))
  169. # 4
  170. w, h, _ = original_image.shape
  171. combined_yuv = np.empty((w, h, 3), dtype=np.uint8)
  172. combined_yuv[..., 0] = styled_grayscale_yuv[..., 0]
  173. combined_yuv[..., 1] = original_yuv[..., 1]
  174. combined_yuv[..., 2] = original_yuv[..., 2]
  175. # 5
  176. img_out = np.array(Image.fromarray(combined_yuv, 'YCbCr').convert('RGB'))
  177. yield ( #相当于return,但用于迭代
  178. (None if last_step else i),
  179. img_out
  180. )
  181. def _tensor_size(tensor):
  182. from operator import mul
  183. return reduce(mul, (d.value for d in tensor.get_shape()), 1)
  184. def rgb2gray(rgb):
  185. return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])
  186. def gray2rgb(gray):
  187. w, h = gray.shape
  188. rgb = np.empty((w, h, 3), dtype=np.float32)
  189. rgb[:, :, 2] = rgb[:, :, 1] = rgb[:, :, 0] = gray
  190. return rgb
begin.py用于开始训练,预设定好了图片地址,其他都为默认设置,可以自行修改:

  1. import os
  2. import numpy as np
  3. import scipy.misc
  4. from stylize import stylize
  5. import math
  6. from argparse import ArgumentParser
  7. from PIL import Image
  8. # default arguments
  9. CONTENT_WEIGHT = 5e0
  10. CONTENT_WEIGHT_BLEND = 1
  11. STYLE_WEIGHT = 5e2
  12. TV_WEIGHT = 1e2
  13. STYLE_LAYER_WEIGHT_EXP = 1
  14. LEARNING_RATE = 1e1
  15. BETA1 = 0.9
  16. BETA2 = 0.999
  17. EPSILON = 1e-08
  18. STYLE_SCALE = 1.0
  19. ITERATIONS = 1000
  20. VGG_PATH = 'imagenet-vgg-verydeep-19.mat'
  21. POOLING = 'max'
  22. def imread(path): #读取图片
  23. img = scipy.misc.imread(path).astype(np.float)
  24. if len(img.shape) == 2:
  25. # grayscale
  26. img = np.dstack((img,img,img))
  27. elif img.shape[2] == 4:
  28. # PNG with alpha channel
  29. img = img[:,:,:3]
  30. return img
  31. content='examples/beili.jpg' #此处为内容图片路径,可修改
  32. styles=['examples/1-style.jpg'] #此处为风格图片路径,可修改
  33. content_image = imread(content) #读取content图片
  34. style_images = [imread(style) for style in styles] #读取style图片,可以有多个
  35. initial_noiseblend = 1.0
  36. initial = content_image
  37. style_blend_weights = [1.0/len(style_images) for _ in style_images]
  38. for iteration, image in stylize(
  39. network=VGG_PATH,
  40. initial=initial,
  41. initial_noiseblend=initial_noiseblend,
  42. content=content_image,
  43. styles=style_images,
  44. preserve_colors=None,
  45. iterations=ITERATIONS,
  46. content_weight=CONTENT_WEIGHT,
  47. content_weight_blend=CONTENT_WEIGHT_BLEND,
  48. style_weight=STYLE_WEIGHT,
  49. style_layer_weight_exp=STYLE_LAYER_WEIGHT_EXP,
  50. style_blend_weights=style_blend_weights,
  51. tv_weight=TV_WEIGHT,
  52. learning_rate=LEARNING_RATE,
  53. beta1=BETA1,
  54. beta2=BETA2,
  55. epsilon=EPSILON,
  56. pooling=POOLING,
  57. print_iterations=None,
  58. checkpoint_iterations=None
  59. ):
  60. print(iteration)
若想从命令行中启动程序,举个例子,输入:
python neural_style.py --content examples/beili.jpg --styles ./examples/1-style.jpg --output ./examples/beili-output.jpg
程序便运行了,具体操作请看README。有什么疑问欢迎评论询问。


声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/530334
推荐阅读
相关标签
  

闽ICP备14008679号