当前位置:   article > 正文

Pytorch 梯度优化问题【Pytorch 基础第二话】_pytorch 矩阵计算梯度优化

pytorch 矩阵计算梯度优化

Debug准备:3D绘图

为了方便研究二元图像,我们需要更直观的看到整个图像全貌。我们编写draw3D_func函数。

import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import LinearSegmentedColormap

def draw3D_func(func,x_range=np.arange(-6, 6, 0.1),y_range=np.arange(-6, 6, 0.1)):
    X, Y = np.meshgrid(x_range, y_range) #生成网格点坐标矩阵
    Z=func([X,Y])
    ax = plt.gca(projection='3d')
    ax.plot_surface(X, Y, Z)
    ax.view_init(60, -30)
    ax.set_xlabel('x[0]')
    ax.set_ylabel('x[1]')
    plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

求解在定点的梯度值

我们假设函数: f ( x 1 , x 2 ) = − ( c o s 2 x 1 + c o s 2 x 2 ) 2 f(x_1,x_2)=-(cos^{2}x_1+cos^{2}x_2)^2 f(x1,x2)=(cos2x1+cos2x2)2,现在我们求X=[ π 3 , π 6 \frac{\pi}{3},\frac{\pi}{6} 3π,6π],的 f f f 的值和该点的梯度值。
使用 f.backward()

from math import pi
import torch
import torch.optim
from debug import ptf_tensor, draw3D_func

# 示例计算定点梯度
x=torch.tensor([pi/3,pi/6],requires_grad=True)
f = - ((x.cos() ** 2).sum()) ** 2
ptf_tensor(f,'f(x1,x2) value')
f.backward() # 计算梯度
ptf_tensor(x.grad,'f grad') # 点在【pi/3,pi/6】的梯度值
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

输出结果:

The info of f(x1,x2) value:
#############################
  @dims: 0
  @size: torch.Size([])
  @ele_sum: 1
  @dtype: torch.float32
  @data:
-1.0
#############################


The info of f grad:
#############################
  @dims: 1
  @size: torch.Size([2])
  @ele_sum: 2
  @dtype: torch.float32
  @data:
tensor([1.7321, 1.7321])
#############################
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

求解当梯度接近0时X的值

我们需要用到torch.optim

import math
from math import pi
import torch
import torch.optim
from debug import ptf_tensor, draw3D_func

# 示例迭代梯度
def func(x):
    return - ((x.cos() ** 2).sum()) ** 2 

x=torch.tensor([pi/3,pi/6],requires_grad=True)
f=func(x)

optimizer=torch.optim.SGD([x,],lr=0.1,momentum=0) #momentun是动量,用于解决梯度下降算法的陷入局部极值问题,这里我们暂时不需要
for step in range(11):
    if step:
        optimizer.zero_grad() # 对之前迭代器的数据清空
        f.backward() # 求解梯度
        optimizer.step() #梯度下降迭代
    f=func(x)
    print ('step {}: x = {}, f(x) = {}, grad = {}'.format(step, x.tolist(), f, x.grad))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

输出:

step 0: x = [1.0471975803375244, 0.5235987901687622], f(x) = -1.0, grad = None
step 1: x = [0.8739925026893616, 0.35039371252059937], f(x) = -1.674528956413269, grad = tensor([1.7321, 1.7321])
step 2: x = [0.6192374229431152, 0.1835097223520279], f(x) = -2.6563119888305664, grad = tensor([2.5476, 1.6688])
step 3: x = [0.3111077845096588, 0.06654246151447296], f(x) = -3.617122173309326, grad = tensor([3.0813, 1.1697])
step 4: x = [0.08941137790679932, 0.016069628298282623], f(x) = -3.9671425819396973, grad = tensor([2.2170, 0.5047])
step 5: x = [0.01855570822954178, 0.0032690390944480896], f(x) = -3.99858021736145, grad = tensor([0.7086, 0.1280])
step 6: x = [0.0037171822041273117, 0.0006542906630784273], f(x) = -3.999943256378174, grad = tensor([0.1484, 0.0261])
step 7: x = [0.0007434850558638573, 0.00013086199760437012], f(x) = -3.999997615814209, grad = tensor([0.0297, 0.0052])
step 8: x = [0.00014869740698486567, 2.617243444547057e-05], f(x) = -4.0, grad = tensor([0.0059, 0.0010])
step 9: x = [2.973947994178161e-05, 5.234485797700472e-06], f(x) = -4.0, grad = tensor([0.0012, 0.0002])
step 10: x = [5.947895260760561e-06, 1.0468970685906243e-06], f(x) = -4.0, grad = tensor([2.3792e-04, 4.1876e-05])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

可以看到在第十次的grad已经非常接近0了,这时x取到[0,0]

其他优化算法Adam

我们利用Adam实现对himmelblau函数进行优化。
f ( x ) = ( ( x 0 2 + x 1 − 11 ) 2 + ( x 0 + x 1 ) 2 − 7 ) 2 f(x)=((x_0^2+x_1-11)^2+(x_0+x_1)^2-7)^2 f(x)=((x02+x111)2+(x0+x1)27)2
利用draw3D画出函数图像:
himmelblau
这个函数具用四个局部极小值点和一个局部最大值点。
考虑到它具用四个极小值点,所以我们从不同的地方出发可能会导致我们去到的地方不一致。
需要注意的是Adam函数不需要手动设置学习率,它会自动计算学习率。

import math
from math import pi
import torch
import torch.optim
from debug import ptf_tensor, draw3D_func

def himmelblau(x):
    return (x[0]**2+x[1]-11)**2+(x[0]+x[1]**2-7)**2

draw3D_func(himmelblau) # 绘制函数图像
x=torch.tensor([0.,0.],requires_grad=True)
optimizer=torch.optim.Adam([x,])

for step in range(20001):
    if step:
        optimizer.zero_grad()
        f.backward() # f.backward() 找局部最小值,我们只需要利用相反(-f).backward()就能求局部最大值
        optimizer.step()
    f=himmelblau(x)
    if step % 1000 ==0:
      print ('step {}: x = {}, f(x) = {}, grad = {}'.format(step, x.tolist(), f, x.grad))  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

输出:

step 0: x = [0.0, 0.0], f(x) = 170.0, grad = None
step 1000: x = [1.270142912864685, 1.1183991432189941], f(x) = 88.42720031738281, grad = tensor([-50.9531, -36.5798])
step 2000: x = [2.332378387451172, 1.9535712003707886], f(x) = 13.730916023254395, grad = tensor([-35.3822, -13.8925])
step 3000: x = [2.8519949913024902, 2.114161968231201], f(x) = 0.6689231395721436, grad = tensor([-7.9508,  1.2138])
step 4000: x = [2.981964111328125, 2.0271568298339844], f(x) = 0.014858869835734367, grad = tensor([-0.7824,  0.5803])
step 5000: x = [2.9991261959075928, 2.0014777183532715], f(x) = 3.956971340812743e-05, grad = tensor([-0.0352,  0.0329])
step 6000: x = [2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08, grad = tensor([-0.0008,  0.0004])
step 7000: x = [2.9999899864196777, 2.000013589859009], f(x) = 4.150251697865315e-09, grad = tensor([-0.0005,  0.0003])
step 8000: x = [2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09, grad = tensor([-0.0003,  0.0002])
step 9000: x = [2.9999964237213135, 2.000005006790161], f(x) = 5.256879376247525e-10, grad = tensor([-1.6212e-04,  9.7275e-05])
step 10000: x = [2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10, grad = tensor([-9.5367e-05,  5.7221e-05])
step 11000: x = [2.9999988079071045, 2.0000014305114746], f(x) = 5.547917680814862e-11, grad = tensor([-5.9128e-05,  2.6703e-05])
step 12000: x = [2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11, grad = tensor([-2.8610e-05,  1.7166e-05])
step 13000: x = [2.999999523162842, 2.000000476837158], f(x) = 5.6843418860808015e-12, grad = tensor([-2.0027e-05,  7.6294e-06])
step 14000: x = [2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12, grad = tensor([-9.5367e-06,  5.7220e-06])
step 15000: x = [3.0, 2.0], f(x) = 0.0, grad = tensor([0., 0.])
step 16000: x = [3.0, 2.0], f(x) = 0.0, grad = tensor([0., 0.])
step 17000: x = [3.0, 2.0], f(x) = 0.0, grad = tensor([0., 0.])
step 18000: x = [3.0, 2.0], f(x) = 0.0, grad = tensor([0., 0.])
step 19000: x = [3.0, 2.0], f(x) = 0.0, grad = tensor([0., 0.])
step 20000: x = [3.0, 2.0], f(x) = 0.0, grad = tensor([0., 0.])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

各种优化算法总览:
optim

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/902825
推荐阅读
相关标签
  

闽ICP备14008679号