赞
踩
还是先上图最直观:
左边是ResNet50,右边是ResNeXt group=32的bottleneck结构。
ResNeXt就是在ResNet的基础上采用了inception的思想,加宽网络(resnet-inception是网络还是inception的,但采用了残差思想)
而且ResNeXt有3中形式:
本地cpn代码中用的是第三种,而github上常用的是第二种
下面对核心代码的学习进行总结
def Build_ResNext(self, input_x): # only cifar10 architecture # 由于是用在cifar10上 所以很多部分被精简 不能有太多stride input_x = self.first_layer(input_x, scope='first_layer') x = self.residual_layer(input_x, out_dim=64, layer_num='1') # 核心部分 def residual_layer(self, input_x, out_dim, layer_num, res_block=blocks): # split + transform(bottleneck) + transition + merge for i in range(res_block): # input_dim = input_x.get_shape().as_list()[-1] input_dim = int(np.shape(input_x)[-1]) if input_dim * 2 == out_dim: flag = True stride = 2 channel = input_dim // 2 else: flag = False stride = 1 # 第一次stride=2,split_layer相当于b中包括concatenate与其之前所有操作 x = self.split_layer(input_x, stride=stride, layer_name='split_layer_'+layer_num+'_'+str(i)) # 相当于最后一步 x = self.transition_layer(x, out_dim=out_dim, scope='trans_layer_'+layer_num+'_'+str(i)) if flag is True : pad_input_x = Average_pooling(input_x) pad_input_x = tf.pad(pad_input_x, [[0, 0], [0, 0], [0, 0], [channel, channel]]) # [?, height, width, channel] else : pad_input_x = input_x input_x = Relu(x + pad_input_x) return input_x def split_layer(self, input_x, stride, layer_name): with tf.name_scope(layer_name) : layers_split = list() for i in range(cardinality) : splits = self.transform_layer(input_x, stride=stride, scope=layer_name + '_splitN_' + str(i)) layers_split.append(splits) return Concatenation(layers_split) def transform_layer(self, x, stride, scope): with tf.name_scope(scope) : x = conv_layer(x, filter=depth, kernel=[1,1], stride=stride, layer_name=scope+'_conv1') x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1') x = Relu(x) x = conv_layer(x, filter=depth, kernel=[3,3], stride=1, layer_name=scope+'_conv2') x = Batch_Normalization(x, training=self.training, scope=scope+'_batch2') x = Relu(x) return x
开篇介绍了inception和inside-outside网络(网络中考虑空间中上下文信息)
这些都是空间上来提升网络。
本文作者希望能在其他层面考虑,比如特征间通道之间的关系
SE核心操作:
这里我们使用 global average pooling 作为 Squeeze 操作。紧接着两个 Fully Connected 层组成一个 Bottleneck 结构去建模通道间的相关性,并输出和输入特征同样数目的权重。我们首先将特征维度降低到输入的 1/16,然后经过 ReLu 激活后再通过一个 Fully Connected 层升回到原来的维度。这样做比直接用一个 Fully Connected 层的好处在于:1)具有更多的非线性,可以更好地拟合通道间复杂的相关性;2)极大地减少了参数量和计算量。然后通过一个 Sigmoid 的门获得 0~1 之间归一化的权重,最后通过一个 Scale 的操作来将归一化后的权重加权到每个通道的特征上
这里的SE操作是连在resnet原本的bottleneck之后
代码可以说非常好理解了
def se_resnet_bottleneck(l, ch_out, stride): shortcut = l l = Conv2D('conv1', l, ch_out, 1, activation=BNReLU) l = Conv2D('conv2', l, ch_out, 3, strides=stride, activation=BNReLU) l = Conv2D('conv3', l, ch_out * 4, 1, activation=get_bn(zero_init=True)) squeeze = GlobalAvgPooling('gap', l) squeeze = FullyConnected('fc1', squeeze, ch_out // 4, activation=tf.nn.relu) squeeze = FullyConnected('fc2', squeeze, ch_out * 4, activation=tf.nn.sigmoid) data_format = get_arg_scope()['Conv2D']['data_format'] ch_ax = 1 if data_format in ['NCHW', 'channels_first'] else 3 shape = [-1, 1, 1, 1] shape[ch_ax] = ch_out * 4 l = l * tf.reshape(squeeze, shape) out = l + resnet_shortcut(shortcut, ch_out * 4, stride, activation=get_bn(zero_init=False)) return tf.nn.relu(out) def GlobalAvgPooling(x, data_format='channels_last'): """ Global average pooling as in the paper `Network In Network <http://arxiv.org/abs/1312.4400>`_. Args: x (tf.Tensor): a 4D tensor. Returns: tf.Tensor: a NC tensor named ``output``. 输出一个(输入channel维的向量)这个方法主要就是在参数量大的时候降维有点类似全连接层的操作 """ assert x.shape.ndims == 4 data_format = get_data_format(data_format) axis = [1, 2] if data_format == 'channels_last' else [2, 3] return tf.reduce_mean(x, axis, name='output')
SE-ResNeXt tensorflow
就是把senet中bottleneck换成ResNeXt的bottleneck
具体代码就是在ResNeXt代码的residual_layer中transition后加入squeeze_excitation_layer
以上全都是Resnet的变种
只研究Resnet变种的原因:
相对于层数,参数少,准确率高,层数深
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。