赞
踩
本文档代码使用的工具版本:
PyTorch 1.6
TensorFlow 2.1
本文档除了最后的正式实现代码之外,所有的示例都默认上/下采样的scale factor等于2。
PixelShuffle(PyTorch中的方法)的作用主要是增大特征图的空间分辨率,将Tensor的shape从 ( N , C ∗ s 2 , H , W ) (N, C*s^2, H, W) (N,C∗s2,H,W)变为 ( N , C , H ∗ s , W ∗ s ) (N, C, H*s, W*s) (N,C,H∗s,W∗s),其中 s s s (scale factor)必需是整数,所以它只能实现上采样,而不能做到下采样。该方法一定程度上可以替代Upsample,常常用在图像超分辨率(super resolution)中。
在TensorFlow中,相应的方法叫做depth_to_space(感觉这个名字起得更好一些),并且TF中有可以实现逆变换功能(下采样)的space_to_depth。
实际上该方法正是在一篇超分文章(ESPCN)中被提出:
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
在这篇文章中,作者将这种上采样方法称为sub-pixel convolution,示意图如下:
稍后ESPCN的作者还写了一篇解释性的文章:
Is the deconvolution layer the same as a convolutional layer?
该文章大概的意思是,sub-pixel convolution与transposed convolution效果等同,但是前者无需参数进行学习,只需要对Tensor进行重排列即可完成,所以可提升网络运行效率。
另外有一篇很好的解释性文档(Deconvolution and Checkerboard Artifacts)讨论了神经网络生成的图像中所存在的棋盘格效应(chekerboard artifacts)。从这个角度来看,transposed convolution的效果可能甚至还不如sub-pixel convolution。因为通常transposed convolution存在overlap,而上述文档认为这个overlap就是导致棋盘格效应的原因,而sub-pixel convolution则可以天然避免overlap的问题。
所以总的来说,sub-pixel convolution效率和效果均不错,现在已经大量用在超分相关的论文中,当然其他需要上采样的地方也有不少应用。
前面讲过,Pytorch中的PixelShuffle相当于TensorFlow中的depth_to_space。
使用PyTorch写一个upscale_factor = 2
的测试例子:
import torch
a = torch.arange(36).reshape([1, 4, 3, 3])
b = torch.pixel_shuffle(a, 2)
print(a)
print(b)
TensorFlow可以使用如下代码写出同上的例子。
(有两个注意点,1、dtype要设置成float型,2、data_format要设置成“NCHW”)
import tensorflow as tf
a = tf.range(0, 36, dtype='float')
b = tf.reshape(a, [1, 4, 3, 3])
c = tf.nn.depth_to_space(b, block_size=2, data_format="NCHW")
print(b)
print(c)
上述例子的结果可用下图表示:
上图中,左侧是4个3*3的特征图,经PixelShuffle后,变成右侧一个1通道6*6的特征图,Tensor的size在PixelShuffle前后的变化根据定义很容易理解。需要注意的地方是数值的重排列方式,4个特征图可以有很多种方式进行重排列(比如图3),但为什么要按照图2所示的方式来做?
结合PixelShuffle提出时的背景:超分,以及一些我们对卷积神经网络的主观认识,有以下一些原因:
所以在图2中,相同颜色,也就是在不同特征图中处于相同空间位置的数值,应该在重排列的时候将其放在一起(相邻的位置)。而至于说是以行优先排列还是以列优先排列,这就无所谓了,因为C/CPP以及很多GPU编程语言都是以行优先的,所以各个框架的实现也都是行优先。
PixelShuffle的逆变换应当遵循同样的原则。
PyTorch没有PixelShuffle的逆变换功能。TensorFlow有,示例如下,b
经depth_to_space变为c
,而c
又经space_to_depth变为d
,打印出来容易发现b与d
相同。
import tensorflow as tf
a = tf.range(0, 36, dtype='float')
b = tf.reshape(a, [1, 4, 3, 3])
c = tf.nn.depth_to_space(b, block_size=2, data_format="NCHW")
print(b)
print(c)
d = tf.nn.space_to_depth(c, block_size=2, data_format="NCHW")
print(d)
根据PixelShuffle的原理,最简单的实现方式就是暴力写循环,通过赋值的方式进行实现,然而在python中写循环绝对是一种极其低效的选择。事实上我们完全可以利用numpy的reshape和transpose进行实现,不必写任何一个循环,这样效率会高得多。
reshape的作用是改变数组的形状,比如说把一个形状为 ( 1 × 24 ) (1 \times 24) (1×24)的数组改成 ( 2 × 3 × 4 ) (2 \times 3 \times 4) (2×3×4)的形状。
transpose的作用是调整数组的维度(axis)顺序。比如对于二维数组而言,对调维度顺序就是我们熟悉的矩阵转置。而transpose可以对更高维度的数组进行调整。
reshape和transpose都是非常高效的算子,究其原因,是因为二者均没有在内存中重新排列数据,只是对数据的shape或strides等信息进行了改变。详情可以参考“numpy的reshape和transpose机制解释”
先讲逆变换,再讲正变换。逆变换相对容易理解一些。
import numpy as np a = np.arange(36).reshape(6, 6) print(a) # ==> # [[ 0 1 2 3 4 5] # [ 6 7 8 9 10 11] # [12 13 14 15 16 17] # [18 19 20 21 22 23] # [24 25 26 27 28 29] # [30 31 32 33 34 35]] b = a.reshape([3, 2, 6]) print(b) # ==> # [[[ 0 1 2 3 4 5] # [ 6 7 8 9 10 11]] # # [[12 13 14 15 16 17] # [18 19 20 21 22 23]] # # [[24 25 26 27 28 29] # [30 31 32 33 34 35]]] print(b[0, :, :]) # ==> # [[ 0 1 2 3 4 5] # [ 6 7 8 9 10 11]] c = a.reshape([6, 3, 2]) print(c[:, 0, :]) # ==> # [[ 0 1] # [ 6 7] # [12 13] # [18 19] # [24 25] # [30 31]] d = a.reshape([3, 2, 3, 2]) print(d[0, :, 0, :]) # ==> # [[0 1] # [6 7]] print(d[0, :, 1, :]) # ==> # [[2 3] # [8 9]]
如果不使用索引切片,而是直接使用print(c ),print(d)来查看数组的话,打印出来的数组形态对于当前问题的理解不是太友好。
所以就画了图4,图4这些花花绿绿的颜色是告诉大家,reshape之后的数组应当顺着shape=3的维度去看,这样就比较容易理解。当我们将数组reshape([3, 2, 3, 2])之后,顺着两个shape=3的维度去看,数组被分成了9个2*2的方块,然后我们可以采用两种思路理解接下来要做的事:
下面就是PixelShuffle逆变换的流程:
怎么理解打印出来的d
呢?
下面数组d
包含4个3*3块,每个对应于图4中6*6数组以stride=2抽样出来的一个子数组。
import numpy as np a = np.arange(36).reshape(6, 6) b = a.reshape([3, 2, 3, 2]) c = b.transpose([1, 3, 0, 2]) d = c.reshape([-1, 3, 3]) print(d) # ==> # [[[ 0 2 4] # [12 14 16] # [24 26 28]] # # [[ 1 3 5] # [13 15 17] # [25 27 29]] # # [[ 6 8 10] # [18 20 22] # [30 32 34]] # # [[ 7 9 11] # [19 21 23] # [31 33 35]]]
下面例子对应于图2。由于图2的例子我们在前面使用PyTorch写过,所以对比PyTorch的结果页可以知道下面的结果是正确的。
PixelShuffle的流程只要把逆变换的流程反过来即可:
import numpy as np a = np.arange(36).reshape([4, 3, 3]) b = a.reshape([2, 2, 3, 3]) c = b.transpose([2, 0, 3, 1]) d = c.reshape([6, 6]) print(a) # ==> # [[[ 0 1 2] # [ 3 4 5] # [ 6 7 8]] # # [[ 9 10 11] # [12 13 14] # [15 16 17]] # # [[18 19 20] # [21 22 23] # [24 25 26]] # # [[27 28 29] # [30 31 32] # [33 34 35]]] print(d) # ==> # [[ 0 9 1 10 2 11] # [18 27 19 28 20 29] # [ 3 12 4 13 5 14] # [21 30 22 31 23 32] # [ 6 15 7 16 8 17] # [24 33 25 34 26 35]]
上面我们使用单通道的数组进行示例,以方便画图和理解。实际上待变换的数组通常不是单通道的,所以还需要一些额外的注意事项。
scale_factor
的相对顺序需保持不变,并且分别放置于height
和width
后面(PixelShuffle);或从height
和width
后面提取到一起(PixelShuffle逆变换)。下面是PyTorch风格的实现:
""" PyTorch style implementation """ import numpy as np import torch import torch.nn.functional as F def pixel_shuffle(tensor, scale_factor): """ Implementation of pixel shuffle using numpy Parameters: ----------- tensor: input tensor, shape is [N, C, H, W] scale_factor: scale factor to up-sample tensor Returns: -------- tensor: tensor after pixel shuffle, shape is [N, C/(s*s), s*H, s*W], where s refers to scale factor """ num, ch, height, width = tensor.shape if ch % (scale_factor * scale_factor) != 0: raise ValueError('channel of tensor must be divisible by ' '(scale_factor * scale_factor).') new_ch = ch // (scale_factor * scale_factor) new_height = height * scale_factor new_width = width * scale_factor tensor = tensor.reshape( [num, new_ch, scale_factor, scale_factor, height, width]) # new axis: [num, new_ch, height, scale_factor, width, scale_factor] tensor = tensor.transpose([0, 1, 4, 2, 5, 3]) tensor = tensor.reshape([num, new_ch, new_height, new_width]) return tensor def pixel_shuffle_inv(tensor, scale_factor): """ Implementation of inverted pixel shuffle using numpy Parameters: ----------- tensor: input tensor, shape is [N, C, H, W] scale_factor: scale factor to down-sample tensor Returns: -------- tensor: tensor after pixel shuffle, shape is [N, (s*s)*C, H/s, W/s], where s refers to scale factor """ num, ch, height, width = tensor.shape if height % scale_factor != 0 or width % scale_factor != 0: raise ValueError('height and widht of tensor must be divisible by ' 'scale_factor.') new_ch = ch * (scale_factor * scale_factor) new_height = height // scale_factor new_width = width // scale_factor tensor = tensor.reshape( [num, ch, new_height, scale_factor, new_width, scale_factor]) # new axis: [num, ch, scale_factor, scale_factor, new_height, new_width] tensor = tensor.transpose([0, 1, 3, 5, 2, 4]) tensor = tensor.reshape([num, new_ch, new_height, new_width]) return tensor if __name__ == '__main__': # numpy computation a = np.arange(2 * 20 * 7 * 7).reshape([2, 20, 7, 7]) b = pixel_shuffle(a, scale_factor=2).astype(np.int32) c = pixel_shuffle_inv(b, scale_factor=2).astype(np.int32) # torch computation a_torch = torch.arange(2 * 20 * 7 * 7).reshape([2, 20, 7, 7]) b_torch = F.pixel_shuffle(a_torch, upscale_factor=2) a_torch = a_torch.numpy().astype(np.int32) b_torch = b_torch.numpy().astype(np.int32) # check print(np.all(b == b_torch)) print(np.all(c == a_torch))
接下来是TensorFlow风格的实现:
""" TensorFlow style implementation """ import numpy as np import tensorflow as tf def depth_to_space(tensor, scale_factor): """ Implementation of depth to space using numpy Parameters: ----------- tensor: input tensor, shape is [N, C, H, W] scale_factor: scale factor to up-sample tensor Returns: -------- tensor: tensor after pixel shuffle, shape is [N, C/(s*s), s*H, s*W], where s refers to scale factor """ num, ch, height, width = tensor.shape if ch % (scale_factor * scale_factor) != 0: raise ValueError('channel of tensor must be divisible by ' '(scale_factor * scale_factor).') new_ch = ch // (scale_factor * scale_factor) new_height = height * scale_factor new_width = width * scale_factor tensor = tensor.reshape( [num, scale_factor, scale_factor, new_ch, height, width]) # new axis: [num, new_ch, height, scale_factor, width, scale_factor] tensor = tensor.transpose([0, 3, 4, 1, 5, 2]) tensor = tensor.reshape([num, new_ch, new_height, new_width]) return tensor def space_to_depth(tensor, scale_factor): """ Implementation of space to depth using numpy Parameters: ----------- tensor: input tensor, shape is [N, C, H, W] scale_factor: scale factor to down-sample tensor Returns: -------- tensor: tensor after pixel shuffle, shape is [N, (s*s)*C, H/s, W/s], where s refers to scale factor """ num, ch, height, width = tensor.shape if height % scale_factor != 0 or width % scale_factor != 0: raise ValueError('height and widht of tensor must be divisible by ' 'scale_factor.') new_ch = ch * (scale_factor * scale_factor) new_height = height // scale_factor new_width = width // scale_factor tensor = tensor.reshape( [num, ch, new_height, scale_factor, new_width, scale_factor]) # new axis: [num, scale_factor, scale_factor, ch, new_height, new_width] tensor = tensor.transpose([0, 3, 5, 1, 2, 4]) tensor = tensor.reshape([num, new_ch, new_height, new_width]) return tensor if __name__ == '__main__': a = np.arange(2 * 20 * 7 * 7).reshape([2, 20, 7, 7]) b = depth_to_space(a, scale_factor=2).astype(np.int32) c = space_to_depth(b, scale_factor=2).astype(np.int32) a_tf = tf.range(2 * 20 * 7 * 7, dtype='float') a_tf = tf.reshape(a_tf, [2, 20, 7, 7]) b_tf = tf.nn.depth_to_space(a_tf, block_size=2, data_format="NCHW") c_tf = tf.nn.space_to_depth(b_tf, block_size=2, data_format="NCHW") b_tf = b_tf.numpy().astype(np.int32) c_tf = c_tf.numpy().astype(np.int32) print(np.all(b == b_tf)) print(np.all(c == c_tf)) print()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。