当前位置:   article > 正文

什么是残差网络(ResNet)?

残差网络

1、残差

残差在数理统计中是指实际观察值与估计值(拟合值)之间的差。在集成学习中可以通过基模型拟合残差,使得集成的模型变得更精确;在深度学习中也有人利用layer去拟合残差将深度神经网络的性能提高变强。这里笔者选了Gradient BoostingResnet两个算法试图让大家更感性的认识到拟合残差的作用机理。

2、Gradient Boosting

Gradient Boosting模型大致可以总结为三部:

  1. 训练一个基学习器Tree_1(这里采用的是决策树)去拟合datalabel
  2. 接着训练一个基学习器Tree_2,输入时data,输出是label和上一个基学习器Tree_1的预测值的差值(残差),这一步总结下来就是使用一个基学习器学习残差
  3. 最后把所有的基学习器的结果相加,做最终决策。

下方代码仅仅做了3步的残差拟合,最后一步就是体现出集成学习的特征,将多个基学习器组合成一个组合模型。

  1. from sklearn.tree import DecisionTreeRegressor
  2. tree_reg1 = DecisionTreeRegressor(max_depth=2)
  3. tree_reg1.fit(X, y)
  4. y2 = y - tree_reg1.predict(X)
  5. tree_reg2 = DecisionTreeRegressor(max_depth=2)
  6. tree_reg2.fit(X, y2)
  7. y3 = y2 - tree_reg2.predict(X)
  8. tree_reg3 = DecisionTreeRegressor(max_depth=2)
  9. tree_reg3.fit(X, y3)
  10. y_pred = sum(tree.predict(X_new) for tree in (tree_reg1, tree_reg2, tree_reg3))

其实上方代码就等价于调用sklearn中的GradientBoostingRegressor这个集成学习API,同时将基学习器的个数n_estimators设为3

  1. from sklearn.ensemble import GradientBoostingRegressor
  2. gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0)
  3. gbrt.fit(X, y)

形象的理解Gradient Boosting,其的过程就像射箭多次射向同一个箭靶,上一次射的偏右,下一箭就会尽量偏左一点,就这样慢慢调整射箭的位置,使得箭的位置和靶心的偏差变小,最终射到靶心。这也是boosting的集成方式会减小模型bias的原因。

残差网络的作用:

1)为什么残差学习的效果会如此的好?与其他论文相比,深度残差学习具有更深的网络结构,此外,残差学习也是网络变深的原因?为什么网络深度如此的重要?

解:一般认为神经网络的每一层分别对应于提取不同层次的特征信息,有低层,中层和高层,而网络越深的时候,提取到的不同层次的信息会越多,而不同层次间的层次信息的组合也会越多。

2)为什么在残差之前网络的深度最深的也只是GoogleNet 22 而残差却可以达到152层,甚至1000层?

解:深度学习对于网络深度遇到的主要问题是梯度消失和梯度爆炸,传统对应的解决方案则是数据的初始化(normlized initializatiton)和(batch normlization)正则化,但是这样虽然解决了梯度的问题,深度加深了,却带来了另外的问题,就是网络性能的退化问题,深度加深了,错误率却上升了,而残差用来设计解决退化问题,其同时也解决了梯度问题,更使得网络的性能也提升了。

残差网络的基本结构:

98b19dc6a7d8338573ae72b622c2414e656.jpg

将输入叠加到下层的输出上。对于一个堆积层结构(几层堆积而成)当输入为x时其学习到的特征记为H(x),现在我们希望其可以学习到残差F(x)=H(x)-x,这样其实原始的学习特征是F(x)+x 。当残差为0时,此时堆积层仅仅做了恒等映射,至少网络性能不会下降,实际上残差不会为0,这也会使得堆积层在输入特征基础上学习到新的特征,从而拥有更好的性能。

对着下方代码我们可以更清晰的看到residual block的具体操作:

  1. 输入x
  2. x通过三层convolutiaon层之后得到输出m
  3. 将原始输入x和输出m加和。

就得到了residual block的总输出,整个过程就是通过三层convolutiaon层去拟合residual block输出与输出的残差m

  1. from keras.layers import Conv2D
  2. from keras.layers import  add
  3. def residual_block(x, f=32, r=4):
  4.     """
  5.     residual block
  6.     :param x: the input tensor
  7.     :param f: the filter numbers
  8.     :param r:
  9.     :return:
  10.     """
  11.     m = conv2d(x, f // r, k=1)
  12.     m = conv2d(m, f // r, k=3)
  13.     m = conv2d(m, f, k=1)
  14.     return add([x, m])

resnet中残差的思想就是去掉相同的主体部分,从而突出微小的变化,让模型集中注意去学习一些这些微小的变化部分。这和我们之前讨论的Gradient Boosting中使用一个基学习器去学习残差思想几乎一样。

 

3、slim库

要学习残差网络,先学习slim库的用法。

首先让我们看看tensorflow怎么实现一个层,例如卷积层:

  1. input = ...
  2. with tf.name_scope('conv1_1') as scope:
  3.   kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,
  4.                                            stddev=1e-1), name='weights')
  5.   conv = tf.nn.conv2d(input, kernel, [1, 1, 1, 1], padding='SAME')
  6.   biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
  7.                        trainable=True, name='biases')
  8.   bias = tf.nn.bias_add(conv, biases)
  9.   conv1 = tf.nn.relu(bias, name=scope)

 

然后slim的实现:

  1. input = ...
  2. net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')

但这个不是重要的,因为tenorflow目前也有大部分层的简单实现,这里比较吸引人的是slim中的repeatstack操作:

假设定义三个相同的卷积层:

  1. net = ...
  2. net = slim.conv2d(net, 256, [3, 3], scope='conv3_1')
  3. net = slim.conv2d(net, 256, [3, 3], scope='conv3_2')
  4. net = slim.conv2d(net, 256, [3, 3], scope='conv3_3')
  5. net = slim.max_pool2d(net, [2, 2], scope='pool2')

slim中的repeat操作可以减少代码量:

  1. net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
  2. net = slim.max_pool2d(net, [2, 2], scope='pool2')

stack是处理卷积核或者输出不一样的情况:

假设定义三层FC

  1. x = slim.fully_connected(x, 32, scope='fc/fc_1')
  2. x = slim.fully_connected(x, 64, scope='fc/fc_2')
  3. x = slim.fully_connected(x, 128, scope='fc/fc_3')

使用stack操作:

slim.stack(x, slim.fully_connected, [32, 64, 128], scope='fc')

同理卷积层也一样:

  1. # 普通方法:
  2. x = slim.conv2d(x, 32, [3, 3], scope='core/core_1')
  3. x = slim.conv2d(x, 32, [1, 1], scope='core/core_2')
  4. x = slim.conv2d(x, 64, [3, 3], scope='core/core_3')
  5. x = slim.conv2d(x, 64, [1, 1], scope='core/core_4')
  6. # 简便方法:
  7. slim.stack(x, slim.conv2d, [(32, [3, 3]), (32, [1, 1]), (64, [3, 3]), (64, [1, 1])], scope='core')

采用如上方法,定义一个VGG也就十几行代码的事了。

  1. def vgg16(inputs):
  2.   with slim.arg_scope([slim.conv2d, slim.fully_connected],
  3.                       activation_fn=tf.nn.relu,
  4.                       weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
  5.                       weights_regularizer=slim.l2_regularizer(0.0005)):
  6.     net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
  7.     net = slim.max_pool2d(net, [2, 2], scope='pool1')
  8.     net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
  9.     net = slim.max_pool2d(net, [2, 2], scope='pool2')
  10.     net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
  11.     net = slim.max_pool2d(net, [2, 2], scope='pool3')
  12.     net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
  13.     net = slim.max_pool2d(net, [2, 2], scope='pool4')
  14.     net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
  15.     net = slim.max_pool2d(net, [2, 2], scope='pool5')
  16.     net = slim.fully_connected(net, 4096, scope='fc6')
  17.     net = slim.dropout(net, 0.5, scope='dropout6')
  18.     net = slim.fully_connected(net, 4096, scope='fc7')
  19.     net = slim.dropout(net, 0.5, scope='dropout7')
  20.     net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
  21.   return net

这个没什么好说的,说一下直接拿经典网络来训练吧。

  1. import tensorflow as tf
  2. vgg = tf.contrib.slim.nets.vgg
  3. # Load the images and labels.
  4. images, labels = ...
  5. # Create the model.
  6. predictions, _ = vgg.vgg_16(images)
  7. # Define the loss functions and get the total loss.
  8. loss = slim.losses.softmax_cross_entropy(predictions, labels)

 【注】slim的卷积层默认是SAME模式的padding,这也就意味着卷积之后和卷积之前的大小相同。残差网络恰好需要这个特性。

 

4、残差网络模型

编写残差网络单元

  1. import tensorflow as tf
  2. import tensorflow.contrib.slim as slim
  3. def resnet_block(inputs,ksize,num_outputs,i):
  4.     with tf.variable_scope('res_unit'+str(i)) as scope:
  5.         part1 = slim.batch_norm(inputs,activation_fn=None)
  6.         part2 = tf.nn.elu(part1)
  7.         part3 = slim.conv2d(part2,num_outputs,[ksize,ksize],activation_fn=None)
  8.         part4 = slim.batch_norm(part3,activation_fn=None)
  9.         part5 = tf.nn.elu(part4)
  10.         part6 = slim.conv2d(part5,num_outputs,[ksize,ksize],activation_fn=None)
  11.         output = part6 + inputs
  12.         return output
  13. def resnet(X_input,ksize,num_outputs,num_classes,num_blocks):
  14.           layer1 = slim.conv2d(X_input,num_outputs,[ksize,ksize],normalizer_fn=slim.batch_norm,scope='conv_0')
  15.           for i in range(num_blocks):
  16.                     layer1 = resnet_block(layer1,ksize,num_outputs,i+1)
  17.           top = slim.conv2d(layer1,num_classes,[ksize,ksize],normalizer_fn=slim.batch_norm,activation_fn=None,scope='conv_top')
  18.           top = tf.reduce_mean(top,[1,2])
  19.           output = slim.layers.softmax(slim.layers.flatten(top))
  20.           return output

 

5、训练网络

  1. import tensorflow as tf
  2. import tensorflow.contrib.slim as slim
  3. from scrips import config
  4. from scrips import resUnit
  5. from scrips import read_tfrecord
  6. from scrips import convert2onehot
  7. import numpy as np
  8. log_dir = config.log_dir
  9. model_dir = config.model_dir
  10. IMG_W = config.IMG_W
  11. IMG_H = config.IMG_H
  12. IMG_CHANNELS = config.IMG_CHANNELS
  13. NUM_CLASSES = config.NUM_CLASSES
  14. BATCH_SIZE = config.BATCH_SIZE
  15. tf.reset_default_graph()
  16. X_input = tf.placeholder(shape=[None,IMG_W,IMG_H,IMG_CHANNELS],dtype=tf.float32,name='input')
  17. y_label = tf.placeholder(shape=[None,NUM_CLASSES],dtype=tf.int32)
  18. #*****************************************************************************************************
  19. output = resUnit.resnet(X_input,3,64,NUM_CLASSES,5)
  20. #*****************************************************************************************************
  21. #loss and accuracy
  22. loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_label, logits=output))
  23. train_step = tf.train.AdamOptimizer(config.lr).minimize(loss)
  24. accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(y_label, 1), tf.argmax(output, 1)), tf.float32))
  25. #tensor board
  26. tf.summary.scalar('loss',loss)
  27. tf.summary.scalar('accuracy',accuracy)
  28. #从tfrecord中读取数据及对应的标签
  29. image, label = read_tfrecord.read_and_decode(config.tfrecord_dir,IMG_W,IMG_H,IMG_CHANNELS)
  30. image_batches, label_batches = tf.train.shuffle_batch([image, label], batch_size=BATCH_SIZE, capacity=2000,min_after_dequeue=1000)
  31. #训练网络
  32. init = tf.global_variables_initializer()
  33. saver = tf.train.Saver()
  34. with tf.Session() as sess:
  35.     ckpt = tf.train.get_checkpoint_state(model_dir)#从中间模型加载权重
  36.     if ckpt and ckpt.model_checkpoint_path:
  37.         print('Restore model from ',end='')
  38.         print(ckpt.model_checkpoint_path)
  39.         saver.restore(sess,ckpt.model_checkpoint_path)
  40.         if (ckpt.model_checkpoint_path.split('-')[-1]).isdigit():
  41.             global_step = int(ckpt.model_checkpoint_path.split('-')[-1])
  42.             print('Restore step at #',end='')
  43.             print(global_step)
  44.         else:
  45.             global_step = 0
  46.     else:
  47.         global_step = 0
  48.         sess.run(init)
  49.     tensor_board_writer = tf.summary.FileWriter(log_dir,tf.get_default_graph())
  50.     merged = tf.summary.merge_all()
  51.     #sess.graph.finalize()
  52.     threads = tf.train.start_queue_runners(sess=sess)
  53.     while True:
  54.         try:
  55.             global_step += 1
  56.             X_train, y_train = sess.run([image_batches, label_batches])
  57.             y_train_onehot = convert2onehot.one_hot(y_train,NUM_CLASSES)
  58.             feed_dict = {X_input: X_train, y_label: y_train_onehot}
  59.             [_, temp_loss, temp_accuracy,summary] = sess.run([train_step, loss, accuracy,merged], feed_dict=feed_dict)
  60.             tensor_board_writer.add_summary(summary,global_step)
  61.             if global_step % config.display == 0:
  62.                 print('step at #{},'.format(global_step), end=' ')
  63.                 print('train loss: {:.5f}'.format(temp_loss), end=' ')
  64.                 print('train accuracy: {:.2f}%'.format(temp_accuracy * 100))
  65.             if global_step % config.snapshot== 0:
  66.                 saver.save(sess,model_dir+'/model.ckpt',global_step)
  67.         except:
  68.             tensor_board_writer.close()
  69.             break;

 

转载于:https://my.oschina.net/u/778683/blog/3100957

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/597782
推荐阅读
相关标签
  

闽ICP备14008679号