笔记3:pytorch.nn.Conv2d如何计算输出特征图尺寸?如何实现Tensorflow中的“same”和“valid”功能_conv2d 输出尺寸

1 pytorch.nn.Conv2d实现机制

1.1 Conv2d简介


  • stride(步长):控制cross-correlation的步长,可以设为1个int型数或者一个(int, int)型的tuple。
  • padding(补0):控制zero-padding的数目。
  • dilation(扩张):控制kernel点(卷积核点)的间距,默认为1(即不采用dilation) 也被称为 "à trous"算法. 可以在此github地址查看:Dilated convolution animations
  • groups(卷积核个数):这个比较好理解,通常来说,卷积个数唯一,但是对某些情况,可以设置范围在1~ in_channels中数目的卷积核:

    {At groups=1, all inputs are convolved to all outputs;
    At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated;
    At groups=in_channels, each input channel is convolved with its own set of filters (of size ⌊out_channelsin_channels⌋). 


先了解一个常识:假如卷积核为2*2大小,在4个通道卷积得到2个通道的过程中,参数的数目为2×(2×2)×4  [注:outx(kwxkw)xinput], 即(2,2,2,4),有这个概念后,接下来:

默认情况下groups=1,表示常规的卷积,例如:input(7, 7, 6)→【Conv, Cout=12, ks=3, groups=1】→output(5, 5, 12),weight为(12, 3, 3, 6),表示需要用到12*6(即out * int)个卷积核;

若指定groups=3,input(7, 7, 6)→【Conv, Cout=12, ks=3, groups=3】→output(5, 5, 12),output的size不变,但是weight变为(12, 3, 3, 2)表示只需要用到12*2个卷积核显然计算的方式发生了改变。因为将输入通道6分成了3组=每组2个通道,同理将输出通道12分成3组=每组4个通道,输入输出每个组进行关联,需要用到4*2+4*2+4*2=12*2(即out * int/gropus)个卷积核

groups=3表示input, output的通道数都被分成了3组,具体来说:

input (7, 7, 6)被分为3组,每组包含6/3=2个通道,size为(7, 7, 2)
即input[:, :, 0:2], input[:, :, 2:4], input[:, :, 4:6]

12个filter被分为3组,每组包含12/3=4个filter,size为(4, 3, 3, 2)(注意最后一维是2,与每一组input的通道数相等)
即weight[0:4], weight[4:8], weight[8:12]

每组的input和filter分别进行卷积运算,得到3组output,每组output的size为(5, 5, 4)
将它们在通道维度上拼接起来,最终output的size为(5, 5, 12)


注意:kernel_size, stride, padding, dilation 不但可以是一个单个的int——表示在高度和宽度使用这个相同的int作为参数,也可以使用一个(int1, int2)的元组(本质上单个的int就是相同int的(int, int))。在元组中,第1个参数对应高度维度,第2个参数对应宽度维度。



1.2  卷积dilation作用 (空洞卷积感受野计算)




dilation 默认为1,这个参数决定了是否采用空洞卷积,默认为1(不采用)。从中文上来讲,这个参数的意义从卷积核上的一个参数到另一个参数需要走过的距离,那当然默认是1了,毕竟不可能两个不同的参数占同一个地方吧(为0)。

更形象和直观的图示可以观察Github上的Dilated convolution animations,展示了dilation=2的情况。


(1) 正常图像空洞卷积

  size=(dilation-1)*(kernel_size-1) + kernel_size  #  kernel_size为卷积核大小



  1. nn.MaxPool2d(kernel_size,.....,dilation)
  2. nn.AvgPool2d(kernel_size, .....,dilation)    

若空洞卷积率为 dilation

  size=2 *(dilation-1)*(kernel_size-1)+kernel_size        #kernel_size为卷积核大小

1.3  nn.Conv2d中的padding操作


Q1: padding是卷积之后还是卷积之前还是卷积之后实现的?

下面将展示一个padding = 1的例子:



  1. Hello, I just can’t figure out the way nn.Conv2d calculate the output .
  2. The result calculated from torch is not the same as some machine learning course had taught.
  3. For example, likes the code below:
  4. >> m = torch.nn.Conv2d(1, 1, 3, padding=0)
  5. >> m(input)
  6. tensor([[[[ 0.5142, 0.3803, 0.2687],
  7. [-0.4321, 1.1637, 1.0675],
  8. [ 0.1742, 0.0869, -0.4451]]]], grad_fn=<ThnnConv2DBackward>)
  9. >> input
  10. tensor([[[[ 0.7504, 0.1157, 1.4940, -0.2619, -0.4732],
  11. [ 0.1497, 0.0805, 2.0829, -0.0925, -1.3367],
  12. [ 1.7471, 0.5205, -0.8532, -0.7358, -1.3931],
  13. [ 0.1159, -0.2376, 1.2683, -0.0959, -1.3171],
  14. [-0.1620, -1.8539, 0.0893, -0.0568, -0.0758]]]])
  15. >> m.weight
  16. Parameter containing:
  17. tensor([[[[ 0.2405, 0.3018, 0.0011],
  18. [-0.1691, -0.0701, -0.0334],
  19. [-0.0429, 0.2668, -0.2152]]]], requires_grad=True)
  20. for the left top element 0.5142, it’s not the output equals to
  21. >> import numpy as np
  22. >> w = np.array([[0.2405, 0.3018, 0.0011],
  23. >> [-0.1691, -0.0701, -0.0334],
  24. >> [-0.0429, 0.2668, -0.2152]])
  25. # top-left 3x3 matrix of 5x5
  26. >> x = np.array([[ 0.7504, 0.1157, 1.4940],
  27. >> [ 0.1497, 0.0805, 2.0829],
  28. >> [1.7471, 0.5205, -0.8532]])
  29. >> print(np.sum(w*x))
  30. # 0.364034 != 0.5142
  31. 0.36403412999999996
  32. My Question here is: Why Could the output not equal to 0.5142?
  33. Further more, when i add paramter padding into nn.Conv2d,
  34. The outcome seems obscure to me as below, thanks a lot for explain that to me.Thank you!
  35. >> input
  36. tensor([[[[ 0.7504, 0.1157, 1.4940, -0.2619, -0.4732],
  37. [ 0.1497, 0.0805, 2.0829, -0.0925, -1.3367],
  38. [ 1.7471, 0.5205, -0.8532, -0.7358, -1.3931],
  39. [ 0.1159, -0.2376, 1.2683, -0.0959, -1.3171],
  40. [-0.1620, -1.8539, 0.0893, -0.0568, -0.0758]]]])
  41. # set padding from 0 to 1 equals to (1, 1)
  42. >> m1 = torch.nn.Conv2d(1, 1, 1, padding=1)
  43. >> m1(input)
  44. tensor([[[[0.9862, 0.9862, 0.9862, 0.9862, 0.9862, 0.9862, 0.9862],
  45. [0.9862, 1.0771, 1.0002, 1.1672, 0.9544, 0.9288, 0.9862],
  46. [0.9862, 1.0043, 0.9959, 1.2385, 0.9749, 0.8242, 0.9862],
  47. [0.9862, 1.1978, 1.0492, 0.8828, 0.8970, 0.8174, 0.9862],
  48. [0.9862, 1.0002, 0.9574, 1.1398, 0.9745, 0.8266, 0.9862],
  49. [0.9862, 0.9665, 0.7615, 0.9970, 0.9793, 0.9770, 0.9862],
  50. [0.9862, 0.9862, 0.9862, 0.9862, 0.9862, 0.9862, 0.9862]]]],
  51. grad_fn=<ThnnConv2DBackward>)
  52. The confused point is that how 0.9862 be calculated?
  53. And what is the default padding strategy in nn.Conv2d?
  54. Thank you for reading and answer!





2 torch实现Tensorflow中的“same”和“valid”功能

2.0 引言

同样的问题见“How to keep the shape of input and output same when dilation conv?”,已经有相关回答,问题具体概述为:



model.add(Conv2D(256, kernel_size=3, strides=1,padding=‘same’, dilation_rate=(2, 2)))

可以使得输入输出尺寸保持一致,都是32*32,都是因为padding='same',但是在 pytorch中,使用:

torch.nn.Conv2d(3,256,3,1,1, dilation=2,bias=False)





  • o = output=32
  • i = input =32
  • p = padding=?  #未知量,需求
  • k = kernel_size=3
  • s = stride=1
  • d = dilation=2







torch.nn.Conv2d(in_channels=3, out_channels=256,kernel_size=3,stride=1,padding=2, dilation=2,bias=True)


striide=1时保持输出卷积输出大小等于输入大小的小诀窍: padding=(kernel_size-1)/2 )



2.1 pytorch中padding-Vaild





  1. >>> input = torch.FloatTensor([[[1,2,3,4,5,6,7,8,9,10,11,12,13]]])
  2. >>> input
  3. (0 ,.,.) =
  4. 1 2 3 4 5 6 7 8 9 10 11 12 13
  5. [torch.FloatTensor of size 1x1x13] # 输入长度为13
  6. conv = torch.nn.Conv1d(1,1,6,5) # 定义一维卷积核
  7. >>> input.size()
  8. >>> torch.Size([1, 1, 13])
  9. >>> input = torch.autograd.Variable(input)
  10. >>> input
  11. Variable containing:
  12. (0 ,.,.) =
  13. 1 2 3 4 5 6 7 8 9 10 11 12 13
  14. [torch.FloatTensor of size 1x1x13]
  15. >>> output = conv(input)
  16. >>> output.size()
  17. >>> torch.Size([1, 1, 2]) # 输出长度为2


2.2 pytorch中padding-same


  1. def conv2d_same_padding(input, weight, bias=None, stride=1, padding=1, dilation=1, groups=1):
  2. # 函数中padding参数可以无视,实际实现的是padding=same的效果
  3. input_rows = input.size(2)
  4. filter_rows = weight.size(2)
  5. effective_filter_size_rows = (filter_rows - 1) * dilation[0] + 1
  6. out_rows = (input_rows + stride[0] - 1) // stride[0]
  7. padding_rows = max(0, (out_rows - 1) * stride[0] +
  8. (filter_rows - 1) * dilation[0] + 1 - input_rows)
  9. rows_odd = (padding_rows % 2 != 0)
  10. padding_cols = max(0, (out_rows - 1) * stride[0] +
  11. (filter_rows - 1) * dilation[0] + 1 - input_rows)
  12. cols_odd = (padding_rows % 2 != 0)
  13. if rows_odd or cols_odd:
  14. input = pad(input, [0, int(cols_odd), 0, int(rows_odd)])
  15. return F.conv2d(input, weight, bias, stride,
  16. padding=(padding_rows // 2, padding_cols // 2),
  17. dilation=dilation, groups=groups)


  1. import torch.utils.data
  2. from torch.nn import functional as F
  3. import math
  4. import torch
  5. from torch.nn.parameter import Parameter
  6. from torch.nn.functional import pad
  7. from torch.nn.modules import Module
  8. from torch.nn.modules.utils import _single, _pair, _triple
  9. class _ConvNd(Module):
  10. def __init__(self, in_channels, out_channels, kernel_size, stride,
  11. padding, dilation, transposed, output_padding, groups, bias):
  12. super(_ConvNd, self).__init__()
  13. if in_channels % groups != 0:
  14. raise ValueError('in_channels must be divisible by groups')
  15. if out_channels % groups != 0:
  16. raise ValueError('out_channels must be divisible by groups')
  17. self.in_channels = in_channels
  18. self.out_channels = out_channels
  19. self.kernel_size = kernel_size
  20. self.stride = stride
  21. self.padding = padding
  22. self.dilation = dilation
  23. self.transposed = transposed
  24. self.output_padding = output_padding
  25. self.groups = groups
  26. if transposed:
  27. self.weight = Parameter(torch.Tensor(
  28. in_channels, out_channels // groups, *kernel_size))
  29. else:
  30. self.weight = Parameter(torch.Tensor(
  31. out_channels, in_channels // groups, *kernel_size))
  32. if bias:
  33. self.bias = Parameter(torch.Tensor(out_channels))
  34. else:
  35. self.register_parameter('bias', None)
  36. self.reset_parameters()
  37. def reset_parameters(self):
  38. n = self.in_channels
  39. for k in self.kernel_size:
  40. n *= k
  41. stdv = 1. / math.sqrt(n)
  42. self.weight.data.uniform_(-stdv, stdv)
  43. if self.bias is not None:
  44. self.bias.data.uniform_(-stdv, stdv)
  45. def __repr__(self):
  46. s = ('{name}({in_channels}, {out_channels}, kernel_size={kernel_size}'
  47. ', stride={stride}')
  48. if self.padding != (0,) * len(self.padding):
  49. s += ', padding={padding}'
  50. if self.dilation != (1,) * len(self.dilation):
  51. s += ', dilation={dilation}'
  52. if self.output_padding != (0,) * len(self.output_padding):
  53. s += ', output_padding={output_padding}'
  54. if self.groups != 1:
  55. s += ', groups={groups}'
  56. if self.bias is None:
  57. s += ', bias=False'
  58. s += ')'
  59. return s.format(name=self.__class__.__name__, **self.__dict__)
  60. class Conv2d(_ConvNd):
  61. def __init__(self, in_channels, out_channels, kernel_size, stride=1,
  62. padding=0, dilation=1, groups=1, bias=True):
  63. kernel_size = _pair(kernel_size)
  64. stride = _pair(stride)
  65. padding = _pair(padding)
  66. dilation = _pair(dilation)
  67. super(Conv2d, self).__init__(
  68. in_channels, out_channels, kernel_size, stride, padding, dilation,
  69. False, _pair(0), groups, bias)
  70. # 修改这里的实现函数
  71. def forward(self, input):
  72. return conv2d_same_padding(input, self.weight, self.bias, self.stride,
  73. self.padding, self.dilation, self.groups)



