赞
踩
ShuffleNetwork:An Extremely Efficient Convolutional Neural Network for Mobile Devices
论文地址:ShuffleNetwork
介绍了一个名字为ShuffleNet计算效率极快的CNN框架,该框架设计用于计算能力有限的移动设备。主要是使用了两种操作:一个是pointwise group convolution,一个是channel shuffle。这两种操作不但减小了计算成本还保持了原有的精度。实验证明在ImageNet 分类中,shuffleNet的性能比其他先进的架构都强。在40MFLOPS的计算预支下,能够在保持AlexNet的精度的同时还提高了将近13倍的速度。
目前的大趋势主要是构建更深更大的卷积网络来解决识别课题。但这些精度比较高的网络都具有大量层和通道,需要几百万个的FLOPS要求。从另外一个角度上来考虑,在计算能力有限的移动设备上来保持任务的精度。很多这方面的成果都是聚焦在pruning,compressing,或者low-bit representng a “basic”Network architecture。本文注重于设计一个能在想要达到的计算范围之内的高效的基本框架。
文中使用pointwise group convolutions技术降低1x1卷积计算的复杂度;为了解决group covolution带来的特征通道之间信息不互通的负面效果,提出了channel shuffle的技术。基本这两种计算搭建了一个名为ShuffleNet的高效框架。在已有的计算预支下,ShuffleNet 允许更多的feature map channels,这些channels能够编码出更多的信息,尤其对于小网络的性能而言是尤为重要的。
ShuffleNet的性能超过了目前很多的先进框架,比如在ImageNet的物体分类中的top-1 error上,比MobileNet低7.8%。并且在保持于AlexNet精度相当的同时,实际上能够加速大约13倍(理论上18倍)。
近些年来,深度神经网络在计算机视觉上取得了巨大的成果,其中模型设计充当了很重要的角色。由于嵌入式系统中高质量的深度神经网络需求的增加激励值高效模型的设计。比如GoogLeNet相比于为了提高网络的深度而简单的堆叠卷积层的架构减少了更多的参数;ResNet利用bottleneck 结构来实现了很好的性能,还有一个比较前言的研究是利用强化学习和模型搜索来实现高效模型的设计。
group convolution的概念是出现在AlexNet网络为了把模型分布在两块GPU中的,并且在ResNetXt表现的也很好。深度可分卷积(depthwise separable convolution)衍生出了可分卷积(separable convolution)。MobileNet中使用了深度可分卷积(depthwise separable convolution)和获得了先进的结果。文中把群卷积(ground convolution)和深度可分卷积(depthwise separable convolution)推广成为了一种新的形式。
虽然在CNN的库中存在“随机稀疏卷积”的层,但是在之前的工作中很少会提及到通道打乱这个操作。与我们研究的同时,也有人采用了这个思想用于两个阶段的卷积,但是他们没有特意的去研究通道打乱的效用以及其在小型模型中的使用。
这个方向的目的是在保持预先训练的模型的准确性的同时加速预测。在预先训练好的模型中,同时保持原性能,修剪网络的连接或者通道来减少冗余的链接。量化和分解可以减少计算中的冗余来加快预测的深度。不修改模型的参数,利用FFT或者其他的方法来优化卷积算法,从来减少在实际中的时间消耗。DIstilling是用大网络来迁移训练小网络,从而使得小网络更加容易训练。
因为分组的群卷积对各个分组通道之间的信息交流会造成阻碍,所以利用通道打乱的方法来帮助通道之间交流,从而让网络的表达能力提高了很多。
具体的打乱示意图如下:
文中的提出的逐点群卷积(pointwise group convolution)其实只是融合了1x1卷积(pointwise convolution)和群卷积(group convolution),即pointwise group convolution = pointwise convolution + group convolution
并且提出了基于该两项技术的ShuffleNet Units,具体的三种形式如下:
其中的GCconv就代表了pointwise group convolution 操作,具体的细节可以在后面的代码中查看。DWconv是在MobileNet中提出的depthwise separable convolution操作,该操作是由spatial convolution 和 pointwise convolution融合而成的。具体原理可以参考该博客MobileNet算法。
from keras import backend as K from keras.applications.imagenet_utils import _obtain_input_shape from keras.models import Model from keras.engine.topology import get_source_inputs from keras.layers import Activation, Add, Concatenate, GlobalAveragePooling2D,GlobalMaxPooling2D, Input, Dense from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, BatchNormalization, Lambda from keras.applications.mobilenet import DepthwiseConv2D import numpy as np def ShuffleNet(include_top=True, input_tensor=None, scale_factor=1.0, pooling='max', input_shape=(224,224,3), groups=1, load_model=None, num_shuffle_units=[3, 7, 3], bottleneck_ratio=0.25, classes=1000): """ ShuffleNet implementation for Keras 2 ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun https://arxiv.org/pdf/1707.01083.pdf Note that only TensorFlow is supported for now, therefore it only works with the data format `image_data_format='channels_last'` in your Keras config at `~/.keras/keras.json`. Parameters ---------- include_top: bool(True) whether to include the fully-connected layer at the top of the network. input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model. scale_factor: scales the number of output channels input_shape: pooling: Optional pooling mode for feature extraction when `include_top` is `False`. - `None` means that the output of the model will be the 4D tensor output of the last convolutional layer. - `avg` means that global average pooling will be applied to the output of the last convolutional layer, and thus the output of the model will be a 2D tensor. - `max` means that global max pooling will be applied. groups: int number of groups per channel num_shuffle_units: list([3,7,3]) number of stages (list length) and the number of shufflenet units in a stage beginning with stage 2 because stage 1 is fixed e.g. idx 0 contains 3 + 1 (first shuffle unit in each stage differs) shufflenet units for stage 2 idx 1 contains 7 + 1 Shufflenet Units for stage 3 and idx 2 contains 3 + 1 Shufflenet Units bottleneck_ratio: bottleneck ratio implies the ratio of bottleneck channels to output channels. For example, bottleneck ratio = 1 : 4 means the output feature map is 4 times the width of the bottleneck feature map. classes: int(1000) number of classes to predict Returns ------- A Keras model instance References ---------- - [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices] (http://www.arxiv.org/pdf/1707.01083.pdf) """ if K.backend() != 'tensorflow': raise RuntimeError('Only TensorFlow backend is currently supported, ' 'as other backends do not support ') name = "ShuffleNet_%.2gX_g%d_br_%.2g_%s" % (scale_factor, groups, bottleneck_ratio, "".join([str(x) for x in num_shuffle_units])) input_shape = _obtain_input_shape(input_shape, default_size=224, min_size=28, require_flatten=include_top, data_format=K.image_data_format()) out_dim_stage_two = {1: 144, 2: 200, 3: 240, 4: 272, 8: 384} if groups not in out_dim_stage_two: raise ValueError("Invalid number of groups.") if pooling not in ['max','avg']: raise ValueError("Invalid value for pooling.") if not (float(scale_factor) * 4).is_integer(): raise ValueError("Invalid value for scale_factor. Should be x over 4.") ##计算每一个stage中输出通道的数目 exp = np.insert(np.arange(0, len(num_shuffle_units), dtype=np.float32), 0, 0) out_channels_in_stage = 2 ** exp out_channels_in_stage *= out_dim_stage_two[groups] # calculate output channels for each stage out_channels_in_stage[0] = 24 # first stage has always 24 output channels out_channels_in_stage *= scale_factor out_channels_in_stage = out_channels_in_stage.astype(int) #构建模型的输入 if input_tensor is None: img_input = Input(shape=input_shape) else: if not K.is_keras_tensor(input_tensor): img_input = Input(tensor=input_tensor, shape=input_shape) else: img_input = input_tensor # create shufflenet architecture ##构建ShuffleNetwork架构 x = Conv2D(filters=out_channels_in_stage[0], kernel_size=(3, 3), padding='same', use_bias=False, strides=(2, 2), activation="relu", name="conv1")(img_input) x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same', name="maxpool1")(x) ##构建2-4stage的架构,总共3个blocks # create stages containing shufflenet units beginning at stage 2 for stage in range(0, len(num_shuffle_units)): repeat = num_shuffle_units[stage] x = _block(x, out_channels_in_stage, repeat=repeat, bottleneck_ratio=bottleneck_ratio, groups=groups, stage=stage + 2) ##模型顶部的架构 if pooling == 'avg': x = GlobalAveragePooling2D(name="global_pool")(x) elif pooling == 'max': x = GlobalMaxPooling2D(name="global_pool")(x) if include_top: x = Dense(units=classes, name="fc")(x) x = Activation('softmax', name='softmax')(x) if input_tensor is not None: inputs = get_source_inputs(input_tensor) else: inputs = img_input model = Model(inputs=inputs, outputs=x, name=name) if load_model is not None: model.load_weights('', by_name=True) return model ##构建Shufflenet中的一个block def _block(x, channel_map, bottleneck_ratio, repeat=1, groups=1, stage=1): """ creates a bottleneck block containing `repeat + 1` shuffle units Parameters ---------- x: Input tensor of with `channels_last` data format channel_map: list list containing the number of output channels for a stage repeat: int(1) number of repetitions for a shuffle unit with stride 1 groups: int(1) number of groups per channel bottleneck_ratio: float 在pointwise group convolution时输入和输出的通道数目比 bottleneck ratio implies the ratio of bottleneck channels to output channels. For example, bottleneck ratio = 1 : 4 means the output feature map is 4 times the width of the bottleneck feature map. stage: int(1) stage number Returns ------- """ ##除了两个stage交替时需要使用concatenate外,其他的都是直接Add, x = _shuffle_unit(x, in_channels=channel_map[stage - 2], out_channels=channel_map[stage - 1], strides=2, groups=groups, bottleneck_ratio=bottleneck_ratio, stage=stage, block=1) for i in range(1, repeat + 1): x = _shuffle_unit(x, in_channels=channel_map[stage - 1], out_channels=channel_map[stage - 1], strides=1, groups=groups, bottleneck_ratio=bottleneck_ratio, stage=stage, block=(i + 1)) return x def _shuffle_unit(inputs, in_channels, out_channels, groups, bottleneck_ratio, strides=2, stage=1, block=1): """ creates a shuffleunit Parameters ---------- inputs: Input tensor of with `channels_last` data format in_channels: number of input channels out_channels: number of output channels strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the width and height. groups: int(1) number of groups per channel bottleneck_ratio: float bottleneck ratio implies the ratio of bottleneck channels to output channels. For example, bottleneck ratio = 1 : 4 means the output feature map is 4 times the width of the bottleneck feature map. stage: int(1) stage number block: int(1) block number Returns ------- """ if K.image_data_format() == 'channels_last': bn_axis = -1 else: bn_axis = 1 prefix = 'stage%d/block%d' % (stage, block) #if strides >= 2: #out_channels -= in_channels # default: 1/4 of the output channel of a ShuffleNet Unit bottleneck_channels = int(out_channels * bottleneck_ratio) ##在stage1和stage2的交界处不用group convolution groups = (1 if stage == 2 and block == 1 else groups) x = _group_conv(inputs, in_channels, out_channels=bottleneck_channels, groups=(1 if stage == 2 and block == 1 else groups), name='%s/1x1_gconv_1' % prefix) x = BatchNormalization(axis=bn_axis, name='%s/bn_gconv_1' % prefix)(x) x = Activation('relu', name='%s/relu_gconv_1' % prefix)(x) ##利用Lambda层来实现channel shuffle层, x = Lambda(channel_shuffle, arguments={'groups': groups}, name='%s/channel_shuffle' % prefix)(x) x = DepthwiseConv2D(kernel_size=(3, 3), padding="same", use_bias=False, strides=strides, name='%s/1x1_dwconv_1' % prefix)(x) x = BatchNormalization(axis=bn_axis, name='%s/bn_dwconv_1' % prefix)(x) x = _group_conv(x, bottleneck_channels, out_channels=out_channels if strides == 1 else out_channels - in_channels, groups=groups, name='%s/1x1_gconv_2' % prefix) x = BatchNormalization(axis=bn_axis, name='%s/bn_gconv_2' % prefix)(x) ##使用不同的stride来判断是concatenate还是add if strides < 2: ret = Add(name='%s/add' % prefix)([x, inputs]) else: avg = AveragePooling2D(pool_size=3, strides=2, padding='same', name='%s/avg_pool' % prefix)(inputs) ret = Concatenate(bn_axis, name='%s/concat' % prefix)([x, avg]) ret = Activation('relu', name='%s/relu_out' % prefix)(ret) return ret ##使用slice的操作来实现group convolution,最后再concatenate def _group_conv(x, in_channels, out_channels, groups, kernel=1, stride=1, name=''): """ grouped convolution Parameters ---------- x: Input tensor of with `channels_last` data format in_channels: number of input channels out_channels: number of output channels groups: number of groups per channel kernel: int(1) An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions. stride: int(1) An integer or tuple/list of 2 integers, specifying the strides of the convolution along the width and height. Can be a single integer to specify the same value for all spatial dimensions. name: str A string to specifies the layer name Returns ------- """ if groups == 1: return Conv2D(filters=out_channels, kernel_size=kernel, padding='same', use_bias=False, strides=stride, name=name)(x) # number of intput channels per group ig = in_channels // groups group_list = [] assert out_channels % groups == 0 for i in range(groups): offset = i * ig group = Lambda(lambda z: z[:, :, :, offset: offset + ig], name='%s/g%d_slice' % (name, i))(x) group_list.append(Conv2D(int(0.5 + out_channels / groups), kernel_size=kernel, strides=stride, use_bias=False, padding='same', name='%s_/g%d' % (name, i))(group)) return Concatenate(name='%s/concat' % name)(group_list) ##利用论文中说到的转置来实现打乱的操作。 def channel_shuffle(x, groups): """ Parameters ---------- x: Input tensor of with `channels_last` data format groups: int number of groups per channel Returns ------- channel shuffled output tensor Examples -------- Example for a 1D Array with 3 groups >>> d = np.array([0,1,2,3,4,5,6,7,8]) >>> x = np.reshape(d, (3,3)) >>> x = np.transpose(x, [1,0]) >>> x = np.reshape(x, (9,)) '[0 1 2 3 4 5 6 7 8] --> [0 3 6 1 4 7 2 5 8]' """ height, width, in_channels = x.shape.as_list()[1:] channels_per_group = in_channels // groups x = K.reshape(x, [-1, height, width, groups, channels_per_group]) x = K.permute_dimensions(x, (0, 1, 2, 4, 3)) # transpose x = K.reshape(x, [-1, height, width, in_channels]) return x
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。