赞
踩
cs231n只给出了softmax的传播梯度计算,所以自己试着写了sigmoid,此处自己写的sigmoid的后向传播梯度只适用于2分类,且标签为[0,1,0,1…]这种列表形式,后向梯度计算时利用CS231N的梯度校验函数算出来的梯度和自己算的梯度总是相差2倍关系,最后发现可能是计算是将1标签和0标签的损失都算了一遍,多算了一次,相当于数据量为2N
import numpy as np import keras.backend as K import tensorflow as tf def eval_numerical_gradient(f, x, verbose=True, h=0.00001): """ a naive implementation of numerical gradient of f at x - f should be a function that takes a single argument - x is the point (numpy array) to evaluate the gradient at """ fx = f(x) # evaluate function value at original point grad = np.zeros_like(x) # iterate over all indexes in x it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) while not it.finished: # evaluate function at x+h ix = it.multi_index oldval = x[ix] x[ix] = oldval + h # increment by h fxph = f(x) # evalute f(x + h) x[ix] = oldval - h fxmh = f(x) # evaluate f(x - h) x[ix] = oldval # restore # compute the partial derivative with centered formula grad[ix] = (fxph - fxmh) / (2 * h) # the slope if verbose: print(ix, grad[ix]) it.iternext() # step to next dimension return grad def rel_error(x, y): return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y)))) def sigmoid_forward(Z): A = 1 / (1 + np.exp(-Z)) cache = Z return A, cache def sigmoid_loss(x, y):#######只针对2分类的函数,多分类用softmax_loss,输入y为标签列表,不是矩阵 probs, _ = sigmoid_forward(x) N = x.shape[0] y_ = np.zeros((N,2)) y_[np.arange(N),y] = 1 loss = -y_*np.log(probs)-(1-y_)*np.log(1-probs) loss = np.mean(loss) da = -(y_/probs-(1-y_)/(1-probs)) dx = da*probs * (1 - probs)/N/2########因为y标签算了2次,相当于2N个数据 ###与吴恩达视频中梯度计算有差异因为视频中标签算了1次 return loss,dx num_classes,num_inputs=2,5 x=0.001*np.random.randn(num_inputs,num_classes) y=np.random.randint(num_classes,size=num_inputs) loss_sig,_= sigmoid_loss(x,y) print('loss of my sigmoid:',loss_sig) print() ##########tf 用来对比 probs, _ = sigmoid_forward(x) N = x.shape[0] y_ = np.zeros((N,2)) y_[np.arange(N),y] = 1 #######tensorflow的sigmoid——entropy的label标签是one_hor形式,输入logis为不是概率,直接就是X,因为内部已经计算了sigmoid loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=x) loss_tf_ave = tf.reduce_mean(loss_tf) print("the loss tf is: ", K.eval(loss_tf)) print("the loss tf ave is: ", K.eval(loss_tf_ave)) #######梯度检验 dx_num = eval_numerical_gradient(lambda x:sigmoid_loss(x,y)[0],x,verbose=False) loss,dx=sigmoid_loss(x,y) print('\ntesting softmax_loss:') print('loss:',loss) print('dx error:',rel_error(dx_num,dx))
loss of my sigmoid: 0.693255238734
the loss tf is: [[ 0.69340774 0.69344562]
[ 0.69317672 0.6938127 ]
[ 0.692981 0.69248962]
[ 0.6933821 0.6935182 ]
[ 0.69347767 0.69286103]]
the loss tf ave is: 0.693255238734
testing softmax_loss:
loss: 0.693255238734
dx error: 8.05752159213e-11
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。