当前位置:   article > 正文

深度学习02-反向传播(backward propagation)_反向传播csdn

反向传播csdn

目的:

计算神经网络的梯度

方式:

链式法则(chain rule)(按照相反的顺序,从输出层遍历网络,依次计算每个中间变量和参数的梯度)


网络层:

在4层的网络中:

输入层

x_i = a_i^{(1)}, i \in 1,2,3,4

第二层

z^{(2)} = W^{(1)}x+b^{(1)}; a^{(2)} = f(z^{(2)})

其中W是权重,b是偏置(bias), a是激活函数(tanh, ReLU, sigmoid等)

第三层

z^{(3)} = W^{(2)}a^{(2)}+b^{(2)}; a^{(3)}=f(z^{(3)})

⚠️:其中W,b都是矩阵(matrices),右上角的角标表示层数

譬如:第二层的W^{(1)}=\begin{bmatrix} W_{11}^{(1)} &W_{12}^{(1)} &W_{13}^{(1)} &W_{14}^{(1)} \\ W_{21}^{(1)} &W_{22}^{(1)} & W_{23}^{(1)} & W_{24}^{(1)} \end{bmatrix}

输出层

s = W^{(3)}a^{(3)}, s是预测输出。

若训练数据为(x,y), x是输入数据,y是标签(输出值)。将y与s通过代价函数(cost fucntion)比较C = cost(s,y)。一般使用交叉熵(cross-entropy)

反向传播通过最小化代价函数从而调整权重和偏置。

w := w-\epsilon \frac{\partial C}{\partial w}

b := b-\epsilon \frac{\partial C}{\partial b}

\epsilon is learning rate

链式法则计算:\frac{\partial C}{\partial w_22^{(2)}} = \frac{\partial C}{\partial z_2^{(3)}}\cdot \frac{\partial z_2^{(3)}}{\partial w_22^{(2)}} =\frac{\partial C}{\partial a_2^{(3)}} \cdot \frac{\partial a_2^{(3)}}{\partial z_2^{(3)}}\cdot a_2^{(2)}

在该神经网络中,计算w_{22}^{(2)}的偏导

由此可知w_{22}^{(2)}a_2^{(2)}, z_2^{(2)}相关联,通过链式法则

  1. x, w, b, conv_param = cache
  2. '''
  3. x is the input data
  4. w is weight
  5. b is bias
  6. conv_param: A dictionary with the following keys:
  7. - 'stride': The number of pixels between adjacent receptive fields
  8. in the horizontal and vertical directions.
  9. - 'pad': The number of pixels that is used to zero-pad the input.
  10. '''
  11. pad = conv_param['pad']
  12. # padding
  13. stride = conv_param['stride']
  14. # stride
  15. N, F, H_dout, W_dout = dout.shape
  16. '''
  17. N-numbers
  18. F-fliter
  19. H-dout: Height
  20. W_dout: Width
  21. '''
  22. F, C, HH, WW = w.shape
  23. # Filter weights of shape (F, C, HH, WW)
  24. # init db, dw, dx
  25. db = torch.zeros_like(b)
  26. dw = torch.zeros_like(w)
  27. dx = torch.zeros_like(x)
  28. for n in range(N):
  29. for f in range(F):
  30. for height in range(H_dout):
  31. for width in range(W_dout):
  32. db[f] += dout[n, f, height, width]
  33. dw[f] += x[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] * dout[n, f, height, width]
  34. dx[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] += w[f] * dout[n, f, height, width]
  35. dx = dx[:, :, 1:-1, 1:-1] # delete padded "pixels"

附:

1.卷积核(fliter)的数量的作用

卷积核(fliter)的数量=神经元的数量,每个神经元对卷积的输入执行不同的卷积。

特征图的结果是应用了卷积核(mapping,stride)后的结果

 

2. 多通道卷积核

3. mapping&stride

Convolution layer, Padding, Stride, and Pooling in CNN

Reference:

1.Understanding Backpropagation Algorithm

2.反向传播与梯度下降详解

3. What is the number of filter in CNN?

4. Backpropagation in RNN Explained

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/405571
推荐阅读
相关标签
  

闽ICP备14008679号