当前位置:   article > 正文

卷积压缩算法--ShuffleNetwork_shuffle卷积

shuffle卷积

ShuffleNetwork 

ShuffleNetwork:An Extremely Efficient Convolutional Neural Network for  Mobile Devices

论文地址:ShuffleNetwork


摘要

         介绍了一个名字为ShuffleNet计算效率极快的CNN框架,该框架设计用于计算能力有限的移动设备。主要是使用了两种操作:一个是pointwise group convolution,一个是channel shuffle。这两种操作不但减小了计算成本还保持了原有的精度。实验证明在ImageNet 分类中,shuffleNet的性能比其他先进的架构都强。在40MFLOPS的计算预支下,能够在保持AlexNet的精度的同时还提高了将近13倍的速度。

介绍

       目前的大趋势主要是构建更深更大的卷积网络来解决识别课题。但这些精度比较高的网络都具有大量层和通道,需要几百万个的FLOPS要求。从另外一个角度上来考虑,在计算能力有限的移动设备上来保持任务的精度。很多这方面的成果都是聚焦在pruning,compressing,或者low-bit representng a “basic”Network architecture。本文注重于设计一个能在想要达到的计算范围之内的高效的基本框架。

        文中使用pointwise group convolutions技术降低1x1卷积计算的复杂度;为了解决group covolution带来的特征通道之间信息不互通的负面效果,提出了channel shuffle的技术。基本这两种计算搭建了一个名为ShuffleNet的高效框架。在已有的计算预支下,ShuffleNet 允许更多的feature map channels,这些channels能够编码出更多的信息,尤其对于小网络的性能而言是尤为重要的。

        ShuffleNet的性能超过了目前很多的先进框架,比如在ImageNet的物体分类中的top-1 error上,比MobileNet低7.8%。并且在保持于AlexNet精度相当的同时,实际上能够加速大约13倍(理论上18倍)。

相关工作

高效模型设计

        近些年来,深度神经网络在计算机视觉上取得了巨大的成果,其中模型设计充当了很重要的角色。由于嵌入式系统中高质量的深度神经网络需求的增加激励值高效模型的设计。比如GoogLeNet相比于为了提高网络的深度而简单的堆叠卷积层的架构减少了更多的参数;ResNet利用bottleneck 结构来实现了很好的性能,还有一个比较前言的研究是利用强化学习和模型搜索来实现高效模型的设计。

群卷积(group convolution)

       group convolution的概念是出现在AlexNet网络为了把模型分布在两块GPU中的,并且在ResNetXt表现的也很好。深度可分卷积(depthwise separable convolution)衍生出了可分卷积(separable convolution)。MobileNet中使用了深度可分卷积(depthwise  separable convolution)和获得了先进的结果。文中把群卷积(ground convolution)和深度可分卷积(depthwise  separable convolution)推广成为了一种新的形式。

通道打乱操作(channel shuffle)

       虽然在CNN的库中存在“随机稀疏卷积”的层,但是在之前的工作中很少会提及到通道打乱这个操作。与我们研究的同时,也有人采用了这个思想用于两个阶段的卷积,但是他们没有特意的去研究通道打乱的效用以及其在小型模型中的使用。

模型加速

        这个方向的目的是在保持预先训练的模型的准确性的同时加速预测。在预先训练好的模型中,同时保持原性能,修剪网络的连接或者通道来减少冗余的链接。量化和分解可以减少计算中的冗余来加快预测的深度。不修改模型的参数,利用FFT或者其他的方法来优化卷积算法,从来减少在实际中的时间消耗。DIstilling是用大网络来迁移训练小网络,从而使得小网络更加容易训练。

方法

通道打乱用于群卷积

        因为分组的群卷积对各个分组通道之间的信息交流会造成阻碍,所以利用通道打乱的方法来帮助通道之间交流,从而让网络的表达能力提高了很多。

        具体的打乱示意图如下:


pointwise group convolution

            文中的提出的逐点群卷积(pointwise group convolution)其实只是融合了1x1卷积(pointwise convolution)和群卷积(group convolution),即pointwise group convolution = pointwise convolution + group convolution

              并且提出了基于该两项技术的ShuffleNet Units,具体的三种形式如下:



其中的GCconv就代表了pointwise group convolution 操作,具体的细节可以在后面的代码中查看。DWconv是在MobileNet中提出的depthwise separable convolution操作,该操作是由spatial convolution 和 pointwise convolution融合而成的。具体原理可以参考该博客MobileNet算法

ShuffleNet整体框架

            基本上和ResNet是一样的,也是分成几个stage(ResNet中有4个stage,这里只有3个),然后在每个stage中用ShuffleNet unit代替原来的Residual block,这也就是ShuffleNet算法的核心。这个表是在限定complexity的情况下,通过改变group(g)的数量来改变output channel的数量,更多的output channel一般而言可以提取更多的特征。

代码实现(keras)

  
  1. from keras import backend as K
  2. from keras.applications.imagenet_utils import _obtain_input_shape
  3. from keras.models import Model
  4. from keras.engine.topology import get_source_inputs
  5. from keras.layers import Activation, Add, Concatenate, GlobalAveragePooling2D,GlobalMaxPooling2D, Input, Dense
  6. from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, BatchNormalization, Lambda
  7. from keras.applications.mobilenet import DepthwiseConv2D
  8. import numpy as np
  9. def ShuffleNet(include_top=True, input_tensor=None, scale_factor=1.0, pooling='max',
  10. input_shape=(224,224,3), groups=1, load_model=None, num_shuffle_units=[3, 7, 3],
  11. bottleneck_ratio=0.25, classes=1000):
  12. """
  13. ShuffleNet implementation for Keras 2
  14. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
  15. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
  16. https://arxiv.org/pdf/1707.01083.pdf
  17. Note that only TensorFlow is supported for now, therefore it only works
  18. with the data format `image_data_format='channels_last'` in your Keras
  19. config at `~/.keras/keras.json`.
  20. Parameters
  21. ----------
  22. include_top: bool(True)
  23. whether to include the fully-connected layer at the top of the network.
  24. input_tensor:
  25. optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model.
  26. scale_factor:
  27. scales the number of output channels
  28. input_shape:
  29. pooling:
  30. Optional pooling mode for feature extraction
  31. when `include_top` is `False`.
  32. - `None` means that the output of the model
  33. will be the 4D tensor output of the
  34. last convolutional layer.
  35. - `avg` means that global average pooling
  36. will be applied to the output of the
  37. last convolutional layer, and thus
  38. the output of the model will be a
  39. 2D tensor.
  40. - `max` means that global max pooling will
  41. be applied.
  42. groups: int
  43. number of groups per channel
  44. num_shuffle_units: list([3,7,3])
  45. number of stages (list length) and the number of shufflenet units in a
  46. stage beginning with stage 2 because stage 1 is fixed
  47. e.g. idx 0 contains 3 + 1 (first shuffle unit in each stage differs) shufflenet units for stage 2
  48. idx 1 contains 7 + 1 Shufflenet Units for stage 3 and
  49. idx 2 contains 3 + 1 Shufflenet Units
  50. bottleneck_ratio:
  51. bottleneck ratio implies the ratio of bottleneck channels to output channels.
  52. For example, bottleneck ratio = 1 : 4 means the output feature map is 4 times
  53. the width of the bottleneck feature map.
  54. classes: int(1000)
  55. number of classes to predict
  56. Returns
  57. -------
  58. A Keras model instance
  59. References
  60. ----------
  61. - [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices]
  62. (http://www.arxiv.org/pdf/1707.01083.pdf)
  63. """
  64. if K.backend() != 'tensorflow':
  65. raise RuntimeError('Only TensorFlow backend is currently supported, '
  66. 'as other backends do not support ')
  67. name = "ShuffleNet_%.2gX_g%d_br_%.2g_%s" % (scale_factor, groups, bottleneck_ratio, "".join([str(x) for x in num_shuffle_units]))
  68. input_shape = _obtain_input_shape(input_shape,
  69. default_size=224,
  70. min_size=28,
  71. require_flatten=include_top,
  72. data_format=K.image_data_format())
  73. out_dim_stage_two = {1: 144, 2: 200, 3: 240, 4: 272, 8: 384}
  74. if groups not in out_dim_stage_two:
  75. raise ValueError("Invalid number of groups.")
  76. if pooling not in ['max','avg']:
  77. raise ValueError("Invalid value for pooling.")
  78. if not (float(scale_factor) * 4).is_integer():
  79. raise ValueError("Invalid value for scale_factor. Should be x over 4.")
  80. ##计算每一个stage中输出通道的数目
  81. exp = np.insert(np.arange(0, len(num_shuffle_units), dtype=np.float32), 0, 0)
  82. out_channels_in_stage = 2 ** exp
  83. out_channels_in_stage *= out_dim_stage_two[groups] # calculate output channels for each stage
  84. out_channels_in_stage[0] = 24 # first stage has always 24 output channels
  85. out_channels_in_stage *= scale_factor
  86. out_channels_in_stage = out_channels_in_stage.astype(int)
  87. #构建模型的输入
  88. if input_tensor is None:
  89. img_input = Input(shape=input_shape)
  90. else:
  91. if not K.is_keras_tensor(input_tensor):
  92. img_input = Input(tensor=input_tensor, shape=input_shape)
  93. else:
  94. img_input = input_tensor
  95. # create shufflenet architecture
  96. ##构建ShuffleNetwork架构
  97. x = Conv2D(filters=out_channels_in_stage[0], kernel_size=(3, 3), padding='same',
  98. use_bias=False, strides=(2, 2), activation="relu", name="conv1")(img_input)
  99. x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same', name="maxpool1")(x)
  100. ##构建2-4stage的架构,总共3个blocks
  101. # create stages containing shufflenet units beginning at stage 2
  102. for stage in range(0, len(num_shuffle_units)):
  103. repeat = num_shuffle_units[stage]
  104. x = _block(x, out_channels_in_stage, repeat=repeat,
  105. bottleneck_ratio=bottleneck_ratio,
  106. groups=groups, stage=stage + 2)
  107. ##模型顶部的架构
  108. if pooling == 'avg':
  109. x = GlobalAveragePooling2D(name="global_pool")(x)
  110. elif pooling == 'max':
  111. x = GlobalMaxPooling2D(name="global_pool")(x)
  112. if include_top:
  113. x = Dense(units=classes, name="fc")(x)
  114. x = Activation('softmax', name='softmax')(x)
  115. if input_tensor is not None:
  116. inputs = get_source_inputs(input_tensor)
  117. else:
  118. inputs = img_input
  119. model = Model(inputs=inputs, outputs=x, name=name)
  120. if load_model is not None:
  121. model.load_weights('', by_name=True)
  122. return model
  123. ##构建Shufflenet中的一个block
  124. def _block(x, channel_map, bottleneck_ratio, repeat=1, groups=1, stage=1):
  125. """
  126. creates a bottleneck block containing `repeat + 1` shuffle units
  127. Parameters
  128. ----------
  129. x:
  130. Input tensor of with `channels_last` data format
  131. channel_map: list
  132. list containing the number of output channels for a stage
  133. repeat: int(1)
  134. number of repetitions for a shuffle unit with stride 1
  135. groups: int(1)
  136. number of groups per channel
  137. bottleneck_ratio: float
  138. 在pointwise group convolution时输入和输出的通道数目比
  139. bottleneck ratio implies the ratio of bottleneck channels to output channels.
  140. For example, bottleneck ratio = 1 : 4 means the output feature map is 4 times
  141. the width of the bottleneck feature map.
  142. stage: int(1)
  143. stage number
  144. Returns
  145. -------
  146. """
  147. ##除了两个stage交替时需要使用concatenate外,其他的都是直接Add,
  148. x = _shuffle_unit(x, in_channels=channel_map[stage - 2],
  149. out_channels=channel_map[stage - 1], strides=2,
  150. groups=groups, bottleneck_ratio=bottleneck_ratio,
  151. stage=stage, block=1)
  152. for i in range(1, repeat + 1):
  153. x = _shuffle_unit(x, in_channels=channel_map[stage - 1],
  154. out_channels=channel_map[stage - 1], strides=1,
  155. groups=groups, bottleneck_ratio=bottleneck_ratio,
  156. stage=stage, block=(i + 1))
  157. return x
  158. def _shuffle_unit(inputs, in_channels, out_channels, groups, bottleneck_ratio, strides=2, stage=1, block=1):
  159. """
  160. creates a shuffleunit
  161. Parameters
  162. ----------
  163. inputs:
  164. Input tensor of with `channels_last` data format
  165. in_channels:
  166. number of input channels
  167. out_channels:
  168. number of output channels
  169. strides:
  170. An integer or tuple/list of 2 integers,
  171. specifying the strides of the convolution along the width and height.
  172. groups: int(1)
  173. number of groups per channel
  174. bottleneck_ratio: float
  175. bottleneck ratio implies the ratio of bottleneck channels to output channels.
  176. For example, bottleneck ratio = 1 : 4 means the output feature map is 4 times
  177. the width of the bottleneck feature map.
  178. stage: int(1)
  179. stage number
  180. block: int(1)
  181. block number
  182. Returns
  183. -------
  184. """
  185. if K.image_data_format() == 'channels_last':
  186. bn_axis = -1
  187. else:
  188. bn_axis = 1
  189. prefix = 'stage%d/block%d' % (stage, block)
  190. #if strides >= 2:
  191. #out_channels -= in_channels
  192. # default: 1/4 of the output channel of a ShuffleNet Unit
  193. bottleneck_channels = int(out_channels * bottleneck_ratio)
  194. ##在stage1和stage2的交界处不用group convolution
  195. groups = (1 if stage == 2 and block == 1 else groups)
  196. x = _group_conv(inputs, in_channels, out_channels=bottleneck_channels,
  197. groups=(1 if stage == 2 and block == 1 else groups),
  198. name='%s/1x1_gconv_1' % prefix)
  199. x = BatchNormalization(axis=bn_axis, name='%s/bn_gconv_1' % prefix)(x)
  200. x = Activation('relu', name='%s/relu_gconv_1' % prefix)(x)
  201. ##利用Lambda层来实现channel shuffle层,
  202. x = Lambda(channel_shuffle, arguments={'groups': groups}, name='%s/channel_shuffle' % prefix)(x)
  203. x = DepthwiseConv2D(kernel_size=(3, 3), padding="same", use_bias=False,
  204. strides=strides, name='%s/1x1_dwconv_1' % prefix)(x)
  205. x = BatchNormalization(axis=bn_axis, name='%s/bn_dwconv_1' % prefix)(x)
  206. x = _group_conv(x, bottleneck_channels, out_channels=out_channels if strides == 1 else out_channels - in_channels,
  207. groups=groups, name='%s/1x1_gconv_2' % prefix)
  208. x = BatchNormalization(axis=bn_axis, name='%s/bn_gconv_2' % prefix)(x)
  209. ##使用不同的stride来判断是concatenate还是add
  210. if strides < 2:
  211. ret = Add(name='%s/add' % prefix)([x, inputs])
  212. else:
  213. avg = AveragePooling2D(pool_size=3, strides=2, padding='same', name='%s/avg_pool' % prefix)(inputs)
  214. ret = Concatenate(bn_axis, name='%s/concat' % prefix)([x, avg])
  215. ret = Activation('relu', name='%s/relu_out' % prefix)(ret)
  216. return ret
  217. ##使用slice的操作来实现group convolution,最后再concatenate
  218. def _group_conv(x, in_channels, out_channels, groups, kernel=1, stride=1, name=''):
  219. """
  220. grouped convolution
  221. Parameters
  222. ----------
  223. x:
  224. Input tensor of with `channels_last` data format
  225. in_channels:
  226. number of input channels
  227. out_channels:
  228. number of output channels
  229. groups:
  230. number of groups per channel
  231. kernel: int(1)
  232. An integer or tuple/list of 2 integers, specifying the
  233. width and height of the 2D convolution window.
  234. Can be a single integer to specify the same value for
  235. all spatial dimensions.
  236. stride: int(1)
  237. An integer or tuple/list of 2 integers,
  238. specifying the strides of the convolution along the width and height.
  239. Can be a single integer to specify the same value for all spatial dimensions.
  240. name: str
  241. A string to specifies the layer name
  242. Returns
  243. -------
  244. """
  245. if groups == 1:
  246. return Conv2D(filters=out_channels, kernel_size=kernel, padding='same',
  247. use_bias=False, strides=stride, name=name)(x)
  248. # number of intput channels per group
  249. ig = in_channels // groups
  250. group_list = []
  251. assert out_channels % groups == 0
  252. for i in range(groups):
  253. offset = i * ig
  254. group = Lambda(lambda z: z[:, :, :, offset: offset + ig], name='%s/g%d_slice' % (name, i))(x)
  255. group_list.append(Conv2D(int(0.5 + out_channels / groups), kernel_size=kernel, strides=stride,
  256. use_bias=False, padding='same', name='%s_/g%d' % (name, i))(group))
  257. return Concatenate(name='%s/concat' % name)(group_list)
  258. ##利用论文中说到的转置来实现打乱的操作。
  259. def channel_shuffle(x, groups):
  260. """
  261. Parameters
  262. ----------
  263. x:
  264. Input tensor of with `channels_last` data format
  265. groups: int
  266. number of groups per channel
  267. Returns
  268. -------
  269. channel shuffled output tensor
  270. Examples
  271. --------
  272. Example for a 1D Array with 3 groups
  273. >>> d = np.array([0,1,2,3,4,5,6,7,8])
  274. >>> x = np.reshape(d, (3,3))
  275. >>> x = np.transpose(x, [1,0])
  276. >>> x = np.reshape(x, (9,))
  277. '[0 1 2 3 4 5 6 7 8] --> [0 3 6 1 4 7 2 5 8]'
  278. """
  279. height, width, in_channels = x.shape.as_list()[1:]
  280. channels_per_group = in_channels // groups
  281. x = K.reshape(x, [-1, height, width, groups, channels_per_group])
  282. x = K.permute_dimensions(x, (0, 1, 2, 4, 3)) # transpose
  283. x = K.reshape(x, [-1, height, width, in_channels])
  284. return x



























本文内容由网友自发贡献,转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/346300
推荐阅读
相关标签
  

闽ICP备14008679号