赞
踩
对于梯度下降,只能说:没有最可怕,只有更可怕。
当动量梯度下降出来之后不久,就有大神再次提出nesterov梯度下降的方法,也是继承了动量梯度的思修,但是它认为,即使当前的梯度为0,由于动量的存在,更新梯度依然会存在并继续更新w。
而继续当前点w的梯度是不太有意义的,有意义的是,假设下一个点w(仅靠动量滚到下一个点的w)的梯度方向才是决定当前梯度的重要因素。
举个通俗的例子就是,你在下坡时,如果在下坡快到底,但又未到底时,动量梯度下降会让你冲到坡的对面去。Nesterov梯度下降会预知你的下一步将会时到坡的对面去,所以会提示你提前刹车,避免过度冲到坡的对面去。这包含了一种提前计算下一步的梯度,来指导当前梯度的想法。
而这个怎么实现呢?先看公式:
o
l
d
w
=
w
oldw=w
oldw=w
w
=
w
−
η
∗
l
g
∗
d
i
s
c
o
u
n
t
w=w-\eta*lg*discount
w=w−η∗lg∗discount
计算下一个
w
w
w的梯度
w
=
o
l
d
w
w=oldw
w=oldw
l
g
=
l
g
∗
d
i
s
c
o
u
n
t
+
g
lg=lg*discount+g
lg=lg∗discount+g
w
=
w
−
η
∗
l
g
w=w-\eta*lg
w=w−η∗lg
就是每次计算梯的时候用下一次的权重,这样下一次的变化就会反应在这次计算的梯度上,如果下一次权重计算的梯度很大,则说明更新权重的时候更应该把下一次的梯度给加上去。
''' 普通的全梯度下降方法 ''' import numpy as np import math import time print(__doc__) sample = 10 num_input = 5 #加入训练数据 np.random.seed(0) normalRand = np.random.normal(0,0.1,sample) # 10个均值为0方差为0.1 的随机数 (b) weight = [5,100,-5,-400,0.02] # 1 * 5 权重 x_train = np.random.random((sample, num_input)) # x 数据(10 * 5) y_train = np.zeros((sample,1)) # y数据(10 * 1) for i in range (0,len(x_train)): total = 0 for j in range(0,len(x_train[i])): total += weight[j]*x_train[i,j] y_train[i] = total+ normalRand[i] # 训练 np.random.seed(0) weight = np.random.random(num_input+1) np.random.seed(0) recordGrade = np.random.random(num_input+1) discount = 0.9 rate = 0.02 start = time.clock() for epoch in range(0,500): #预先走一步,计算下一个的权重w oldweight = np.copy(weight) for i in range(0,len(weight)): weight[i] = weight[i] - rate * discount * recordGrade[i] # 计算loss predictY = np.zeros((len(x_train))) for i in range(0,len(x_train)): predictY[i] = np.dot(x_train[i],weight[0:num_input])+ weight[num_input] loss = 0 for i in range(0,len(x_train)): loss += (predictY[i]-y_train[i])**2 print("epoch: %d-loss: %f"%(epoch,loss)) #打印迭代次数和损失函数 if loss < 0.1: end = time.clock() print("收敛时间:%s ms"%str(end-start)) print("收敛成功%d-epoch"%epoch) break # 计算梯度并更新 weight = oldweight for i in range(0,len(weight)-1): #权重w grade = 0 for j in range(0,len(x_train)): grade += 2*(predictY[j]-y_train[j])*x_train[j,i] #计算梯度用下一个权重,计算权重用原来的 recordGrade[i] = recordGrade[i]*discount + grade weight[i] = weight[i] - rate*recordGrade[i] grade = 0 for j in range(0,len(x_train)): #偏差b grade += 2*(predictY[j]-y_train[j]) recordGrade[num_input] = recordGrade[num_input]*discount + grade weight[num_input] = weight[num_input] - rate*recordGrade[num_input] print(weight)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。