赞
踩
计算神经网络的梯度
链式法则(chain rule)(按照相反的顺序,从输出层遍历网络,依次计算每个中间变量和参数的梯度)
在4层的网络中:
,
其中W是权重,b是偏置(bias), a是激活函数(tanh, ReLU, sigmoid等)
⚠️:其中W,b都是矩阵(matrices),右上角的角标表示层数
譬如:第二层的
, s是预测输出。
若训练数据为, x是输入数据,y是标签(输出值)。将y与s通过代价函数(cost fucntion)比较。一般使用交叉熵(cross-entropy)
反向传播通过最小化代价函数从而调整权重和偏置。
is learning rate
在该神经网络中,计算的偏导
由此可知与相关联,通过链式法则
- x, w, b, conv_param = cache
- '''
- x is the input data
- w is weight
- b is bias
- conv_param: A dictionary with the following keys:
- - 'stride': The number of pixels between adjacent receptive fields
- in the horizontal and vertical directions.
- - 'pad': The number of pixels that is used to zero-pad the input.
- '''
- pad = conv_param['pad']
- # padding
- stride = conv_param['stride']
- # stride
- N, F, H_dout, W_dout = dout.shape
- '''
- N-numbers
- F-fliter
- H-dout: Height
- W_dout: Width
- '''
- F, C, HH, WW = w.shape
- # Filter weights of shape (F, C, HH, WW)
-
- # init db, dw, dx
- db = torch.zeros_like(b)
- dw = torch.zeros_like(w)
- dx = torch.zeros_like(x)
-
- for n in range(N):
- for f in range(F):
- for height in range(H_dout):
- for width in range(W_dout):
- db[f] += dout[n, f, height, width]
- dw[f] += x[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] * dout[n, f, height, width]
- dx[n, :, height * stride:height * stride + HH, width * stride:width * stride + WW] += w[f] * dout[n, f, height, width]
-
- dx = dx[:, :, 1:-1, 1:-1] # delete padded "pixels"
卷积核(fliter)的数量=神经元的数量,每个神经元对卷积的输入执行不同的卷积。
特征图的结果是应用了卷积核(mapping,stride)后的结果
Convolution layer, Padding, Stride, and Pooling in CNN
1.Understanding Backpropagation Algorithm
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。