当前位置:   article > 正文

Pytorch中几种调整学习率scheduler机制(策略)的用法即其可视化_torch scheduler

torch scheduler

申明此篇博文是以AlexNet为网络架构(其需要输入的图像大小为227x227x3),CIFAR10为数据集,SGD为梯度下降函数举例。

运行此程序时,文件的结构:

  1. /content/drive/MyDrive/coder/Simple-CV-Pytorch-master
  2. |
  3. |
  4. |
  5. |----AlexNet----train.py(train_adjust_learning_rate.py,train_MultiStepLR.py等等)
  6. |
  7. |
  8. |
  9. |----tensorboard(保存tensorboard的文件夹)
  10. |
  11. |
  12. |
  13. |----checkpoint(保存模型的文件夹)
  14. |
  15. |
  16. |
  17. |----data(数据集所在文件夹)
  18. |
  19. |
  20. |
  21. |----run.ipynb(运行.ipynb文件)

首先,我们设置的学习率在一定时候可能无法使我们当前的损失下降,所以此时需要重新调节学习率,如果是使用Pytorch编程,则这个时候就会用到Pytorch中的scheduler。

scheduler机制(策略)位于torch.optim.lr_scheduler.XX中

2dd239eb5c3f4fd1a98d29621387607a.png

 83fb8f5b834f48308005accd4bd454d2.png

99f3a1f7ac204b6aa2a6ec3f4445f137.png如果不使用任何机制(策略)直接修改学习率 

  1. for param_group in optim.param_groups:
  2. param_group['lr'] = lr

scheduler机制(策略)常用的大致有七种形式,我们逐一介绍,并给出代码,为更好理解将其可视化:

1.自定义衰减学习率:adjust_learning_rate()

(作者写的)函数讲解:分段,每隔几(2)段个epoch,第一个epoch为序号0不计,使学习率变乘以0.1的epoch次方数

  1. def adjust_learning_rate(optim, epoch, size=2, gamma=0.1):
  2. if (epoch + 1) % size == 0:
  3. pow = (epoch + 1) // size
  4. lr = learning_rate * np.power(gamma, pow)
  5. for param_group in optim.param_groups:
  6. param_group['lr'] = lr

若想知道训练代码如何理解可看我往期博文:

19.初识Pytorch之完整的模型套路-整理后的代码https://blog.csdn.net/XiaoyYidiaodiao/article/details/122720320?spm=1001.2014.3001.5501注意:此段代码无比简陋,仅为我平时书写代码的雏形,不符合规范,大致能理解尚可!

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. def adjust_learning_rate(optim, epoch, size=2, gamma=0.1):
  11. if (epoch + 1) % size == 0:
  12. pow = (epoch + 1) // size
  13. lr = learning_rate * np.power(gamma, pow)
  14. for param_group in optim.param_groups:
  15. param_group['lr'] = lr
  16. # 1.Create SummaryWriter
  17. writer = SummaryWriter("../tensorboard")
  18. # 2.Ready dataset
  19. train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
  20. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  21. print('CUDA available: {}'.format(torch.cuda.is_available()))
  22. # 3.Length
  23. train_dataset_size = len(train_dataset)
  24. print("the train dataset size is {}".format(train_dataset_size))
  25. # 4.DataLoader
  26. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  27. # 5.Create model
  28. model = AlexNet()
  29. if torch.cuda.is_available():
  30. model = model.cuda()
  31. model = torch.nn.DataParallel(model).cuda()
  32. else:
  33. model = torch.nn.DataParallel(model)
  34. # 6.Create loss
  35. cross_entropy_loss = nn.CrossEntropyLoss()
  36. # 7.Optimizer
  37. lr = learning_rate = 1e-3
  38. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  39. # 8. Set some parameters to control loop
  40. # epoch
  41. epoch = 20
  42. iter = 0
  43. t0 = time.time()
  44. for i in range(epoch):
  45. t1 = time.time()
  46. print(" -----------------the {} number of training epoch --------------".format(i))
  47. model.train()
  48. for data in train_dataloader:
  49. imgs, targets = data
  50. if torch.cuda.is_available():
  51. cross_entropy_loss = cross_entropy_loss.cuda()
  52. imgs, targets = imgs.cuda(), targets.cuda()
  53. outputs = model(imgs)
  54. loss_train = cross_entropy_loss(outputs, targets)
  55. writer.add_scalar("train_loss", loss_train.item(), iter)
  56. optim.zero_grad()
  57. loss_train.backward()
  58. optim.step()
  59. iter = iter + 1
  60. if iter % 100 == 0:
  61. print(
  62. "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
  63. .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
  64. np.mean(loss_train.item())))
  65. writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
  66. adjust_learning_rate(optim, i)
  67. t2 = time.time()
  68. h = (t2 - t1) // 3600
  69. m = ((t2 - t1) % 3600) // 60
  70. s = ((t2 - t1) % 3600) % 60
  71. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  72. if i % 1 == 0:
  73. print("Save state, iter: {} ".format(i))
  74. torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))
  75. torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
  76. t3 = time.time()
  77. h_t = (t3 - t0) // 3600
  78. m_t = ((t3 - t0) % 3600) // 60
  79. s_t = ((t3 - t0) % 3600) // 60
  80. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  81. writer.close()

注意:以上程序直接使用Pycharm上运行。

我们的train.py文件在AlexNet文件夹里,文件夹data、tensorboard、checkpoint与AlexNet文件夹平级,所以使用data、tensorboard、checkpoint在前面加入返回上一级../data、../tensorboard、../checkpoint。

运行结果:

f22165c55e62446e860997cfaa2224ac.png

90ded5095e8e423e8a3134b2a9a44778.png bb056d66c48448fa929059b04a61f114.png

2376c191db18466f892bdf57321469f1.png 

可视化lr与loss: lr(看橙色透明线条)

abff595c04ca4defaf4ea76fa4c22439.png

 b5bd7bc0d9cb4ca28b887c8ca1bcc820.png

分析:

  1. (1) 从0-1 epoch时,lr为0.001
  2. (2) 从2-3 epoch时,lr为0.0001
  3. (3) 从4-5 epoch时,lr为1.0000000000000003e-05;
  4. (4) 从6-7 epoch时,lr为1.0000000000000002e-06;
  5. (5) 从8-9 epoch时,lr为1.0000000000000002e-07;
  6. (6) 从10-11 epoch时,lr为1.0000000000000004e-08;
  7. (7) 从12-13 epoch时,lr为1.0000000000000005e-09;
  8. (8) 从14-15 epoch时,lr为1.0000000000000004e-10;
  9. (9) 从16-17 epoch时,lr为1.0000000000000006e-11;
  10. (10) 从18-19 epoch时,lr为1.0000000000000006e-12

2.分区间,分频率衰减学习率:MultiStepLR()

scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, milestones=[5, 10, 15], gamma=0.1, verbose=True)

其参数:

 def __init__(self, optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False):
  1. optimizer: 需优化的变量
  2. miestones: 分段区域
  3. gamma: 到达分段点之后,乘以gamma
  4. last_epoch=-1: 已经走了多少个epoch,下一个milestone减去last_epoch就是需要的epoch数, 最好别修改
  5. verbose=False: 是否打印
  6. 例如
  7. MultiStepLR(optim, milestones=[5, 10, 15], gamma=0.1, verbose=True)
  8. lr=1e-3, len(epoch)=20, milestones=[5, 10, 15], gamma=0.1
  9. epoch <=4, lr=1e-3
  10. 5<= epoch <=9, lr=1e-4
  11. 10<= epoch <=14, lr=1e-5
  12. 15<= epoch <20, lr=1e-6

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. # 1.Create SummaryWriter
  11. writer = SummaryWriter("tensorboard")
  12. # 2.Ready dataset
  13. train_dataset = torchvision.datasets.CIFAR10(root="data", train=True, transform=transforms.Compose(
  14. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  15. print('CUDA available: {}'.format(torch.cuda.is_available()))
  16. # 3.Length
  17. train_dataset_size = len(train_dataset)
  18. print("the train dataset size is {}".format(train_dataset_size))
  19. # 4.DataLoader
  20. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  21. # 5.Create model
  22. model = AlexNet()
  23. if torch.cuda.is_available():
  24. model = model.cuda()
  25. model = torch.nn.DataParallel(model).cuda()
  26. else:
  27. model = torch.nn.DataParallel(model)
  28. # 6.Create loss
  29. cross_entropy_loss = nn.CrossEntropyLoss()
  30. # 7.Optimizer
  31. learning_rate = 1e-3
  32. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  33. scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, milestones=[5, 10, 15], gamma=0.1, verbose=True)
  34. # 8. Set some parameters to control loop
  35. # epoch
  36. epoch = 20
  37. iter = 0
  38. t0 = time.time()
  39. for i in range(epoch):
  40. t1 = time.time()
  41. print(" -----------------the {} number of training epoch --------------".format(i))
  42. model.train()
  43. for data in train_dataloader:
  44. imgs, targets = data
  45. if torch.cuda.is_available():
  46. cross_entropy_loss = cross_entropy_loss.cuda()
  47. imgs, targets = imgs.cuda(), targets.cuda()
  48. outputs = model(imgs)
  49. loss_train = cross_entropy_loss(outputs, targets)
  50. writer.add_scalar("train_loss", loss_train.item(), iter)
  51. optim.zero_grad()
  52. loss_train.backward()
  53. optim.step()
  54. iter = iter + 1
  55. if iter % 100 == 0:
  56. print(
  57. "Epoch: {} | Iteration: {} | lr1: {} | lr2: {} |loss: {} | np.mean(loss): {} "
  58. .format(i, iter, scheduler.get_lr()[0], scheduler.get_last_lr()[0], loss_train.item(),
  59. np.mean(loss_train.item())))
  60. writer.add_scalar("lr", scheduler.get_lr()[0], i)
  61. writer.add_scalar("lr_last", scheduler.get_last_lr()[0], i)
  62. scheduler.step()
  63. t2 = time.time()
  64. h = (t2 - t1) // 3600
  65. m = ((t2 - t1) % 3600) // 60
  66. s = ((t2 - t1) % 3600) % 60
  67. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  68. if i % 1 == 0:
  69. print("Save state, iter: {} ".format(i))
  70. torch.save(model.state_dict(), "checkpoint/AlexNet_{}.pth".format(i))
  71. torch.save(model.state_dict(), "checkpoint/AlexNet.pth")
  72. t3 = time.time()
  73. h_t = (t3 - t0) // 3600
  74. m_t = ((t3 - t0) % 3600) // 60
  75. s_t = ((t3 - t0) % 3600) // 60
  76. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  77. writer.close()

注意:以上程序的运行.ipynb文件为:

  1. import os
  2. os.chdir("/content/drive/MyDrive/coder/Simple-CV-Pytorch-master")
  3. !python AlexNet/train.py

也就是说,我们的train.py文件虽然在AlexNet文件夹里,但是此文件base_dir为 /content/drive/MyDrive/coder/Simple-CV-Pytorch-master,所以文件夹data、tensorboard、checkpoint直接使用data、tensorboard、checkpoint不在前面加入返回上一级../data、../tensorboard、../checkpoint。

若想知道如何白嫖谷歌的服务器可看我往期博客:

穷学生我本人如何免费使用谷歌GPU教程https://blog.csdn.net/XiaoyYidiaodiao/article/details/122751289?spm=1001.2014.3001.5501

运行结果:

e4001c3124bc4778ae04aaa2dde548a4.png

5e1ee7d8e5cd4dbb807020f7a5037d74.png

5943aef9d51e4b27af7d4092050e1866.png

 21e8115fe43643c0ab1117fbf930ff7d.png

 

可视化lr与loss:lr(看橙色透明线条)

4ec915d414b54acfb9ff115ac4b31293.png

bfd016bd07454504844408488f99272b.png

9c3e8720e6c84d3cb32a15efc30dc4f1.png

分析:

  1. lr: scheduler.get_lr()[0]
  2. (1) 从 0-4 epoch时,lr为1e-3
  3. (2) 5 epoch时,lr为1e-5
  4. (3) 从 6-9 epoch时,lr为1e-4;
  5. (4) 10 epoch时,lr为1e-6;
  6. (5) 从 11-14 epoch时,lr为1e-5;
  7. (6) 15 epoch时,lr为1e-7;
  8. (7) 从 16-19 epoch时,lr为1e-6

  1. lr_last: scheduler.get_last_lr()[0]
  2. (1) 从 0-4 epoch时,lr_last为1e-3
  3. (2) 从 5-9 epoch时,lr_last为1e-4
  4. (3) 从 10-14 epoch时,lr_last为1e-5
  5. (4) 从 15-19 epoch时,lr_last为1e-6

 则证明我们真正的学习率显示为:scheduler.get_last_lr()[0]

 3.分步长,衰减学习率:StepLR()

scheduler = torch.optim.lr_scheduler.StepLR(optim, step_size=5, gamma=0.2)

其参数:

def __init__(self, optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False):

  1. optimizer: 需优化的变量
  2. step_size: 衰减的步长
  3. gamma: 到达此步长之后,乘以gamma
  4. last_epoch=-1: 最好别修改
  5. verbose=False: 是否打印
  6. 例如
  7. StepLR(optim, step_size=5, gamma=0.2)
  8. lr=1e-3, len(epoch)=20, step_size=5, gamma=0.2
  9. epoch <5, lr=1e-3
  10. 5<= epoch <10, lr=2e-4
  11. 10<= epoch <15, lr=4e-5
  12. 15<= epoch <20, lr=8e-6

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. # 1.Create SummaryWriter
  11. writer = SummaryWriter("../tensorboard")
  12. # 2.Ready dataset
  13. train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
  14. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  15. print('CUDA available: {}'.format(torch.cuda.is_available()))
  16. # 3.Length
  17. train_dataset_size = len(train_dataset)
  18. print("the train dataset size is {}".format(train_dataset_size))
  19. # 4.DataLoader
  20. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  21. # 5.Create model
  22. model = AlexNet()
  23. if torch.cuda.is_available():
  24. model = model.cuda()
  25. model = torch.nn.DataParallel(model).cuda()
  26. else:
  27. model = torch.nn.DataParallel(model)
  28. # 6.Create loss
  29. cross_entropy_loss = nn.CrossEntropyLoss()
  30. # 7.Optimizer
  31. learning_rate = 1e-3
  32. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  33. scheduler = torch.optim.lr_scheduler.StepLR(optim, step_size=5, gamma=0.2)
  34. # 8. Set some parameters to control loop
  35. # epoch
  36. epoch = 20
  37. iter = 0
  38. t0 = time.time()
  39. for i in range(epoch):
  40. t1 = time.time()
  41. print(" -----------------the {} number of training epoch --------------".format(i))
  42. model.train()
  43. for data in train_dataloader:
  44. imgs, targets = data
  45. if torch.cuda.is_available():
  46. cross_entropy_loss = cross_entropy_loss.cuda()
  47. imgs, targets = imgs.cuda(), targets.cuda()
  48. outputs = model(imgs)
  49. loss_train = cross_entropy_loss(outputs, targets)
  50. writer.add_scalar("train_loss", loss_train.item(), iter)
  51. optim.zero_grad()
  52. loss_train.backward()
  53. optim.step()
  54. iter = iter + 1
  55. if iter % 100 == 0:
  56. print(
  57. "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
  58. .format(i, iter, scheduler.get_last_lr()[0], loss_train.item(),
  59. np.mean(loss_train.item())))
  60. writer.add_scalar("lr", scheduler.get_last_lr()[0], i)
  61. scheduler.step()
  62. t2 = time.time()
  63. h = (t2 - t1) // 3600
  64. m = ((t2 - t1) % 3600) // 60
  65. s = ((t2 - t1) % 3600) % 60
  66. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  67. if i % 1 == 0:
  68. print("Save state, iter: {} ".format(i))
  69. torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))
  70. torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
  71. t3 = time.time()
  72. h_t = (t3 - t0) // 3600
  73. m_t = ((t3 - t0) % 3600) // 60
  74. s_t = ((t3 - t0) % 3600) // 60
  75. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  76. writer.close()

注意:以上程序直接使用Pycharm上运行。(谷歌GPU服务器的时常用完了)

我们的train.py文件在AlexNet文件夹里,文件夹data、tensorboard、checkpoint与AlexNet文件夹平级,所以使用data、tensorboard、checkpoint在前面加入返回上一级../data、../tensorboard、../checkpoint。

运行结果:

9dde8da62c624b58bc85a87bde4e48b4.png

27944565d5e0493abf1b57a23a21c950.png f7543146e83a4fd0938fc2741e48fcd7.png

 3e58838f097b4b93b063f822d060adfc.png

731a295bf8404d6da4883a0b75ae339f.png 

可视化lr与loss: lr(看橙色透明线条):

7dbd9a0ddb4d423093fe2b1b303a81c8.png

 101250b403d54f888018095fcafb0465.png

分析:

  1. lr_last: scheduler.get_last_lr()[0]
  2. (1) 从 0-4 epoch时,lr_last为1e-3
  3. (2) 从 5-9 epoch时,lr_last为2e-4
  4. (3) 从 10-14 epoch时,lr_last为4e-5
  5. (4) 从 15-19 epoch时,lr_last为8e-6

4.匿名调整学习率:LambdaLR() 

  1. lambda1 = lambda epoch: (epoch) // 2
  2. scheduler = torch.optim.lr_scheduler.LambdaLR(optim, lr_lambda=lambda1)

其参数:

def __init__(self, optimizer, lr_lambda, last_epoch=-1, verbose=False):

  1. optimizer: 需优化的变量
  2. lr_lambda: 函数或者函数列表
  3. last_epoch=-1: 最好别修改
  4. verbose=False: 是否打印
  5. 例如
  6. new_lr=lr_lambda(epoch) * initial_lr
  7. lambda1 = lambda epoch: epoch // 2
  8. LambdaLR(optimizer, lr_lambda=lambda1, last_epoch=-1)
  9. 当epoch=0时,new_lr = (0 // 2) * 0.001 = 0 * 0.001 = 0
  10. 当epoch=1时,new_lr = (1 // 2) * 0.001 = 0 * 0.001 = 0
  11. 当epoch=2时,new_lr = (2 // 2) * 0.001 = 1 * 0.001 = 0.001
  12. 当epoch=3时,new_lr = (3 // 2) * 0.001 = 1 * 0.001 = 0.001
  13. 当epoch=4时,new_lr = (4 // 2) * 0.001 = 2 * 0.001 = 0.002
  14. ...

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. # 1.Create SummaryWriter
  11. writer = SummaryWriter("../tensorboard")
  12. # 2.Ready dataset
  13. train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
  14. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  15. print('CUDA available: {}'.format(torch.cuda.is_available()))
  16. # 3.Length
  17. train_dataset_size = len(train_dataset)
  18. print("the train dataset size is {}".format(train_dataset_size))
  19. # 4.DataLoader
  20. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  21. # 5.Create model
  22. model = AlexNet()
  23. if torch.cuda.is_available():
  24. model = model.cuda()
  25. model = torch.nn.DataParallel(model).cuda()
  26. else:
  27. model = torch.nn.DataParallel(model)
  28. # 6.Create loss
  29. cross_entropy_loss = nn.CrossEntropyLoss()
  30. # 7.Optimizer
  31. learning_rate = 1e-3
  32. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  33. lambda1 = lambda epoch: (epoch) // 2
  34. scheduler = torch.optim.lr_scheduler.LambdaLR(optim, lr_lambda=lambda1)
  35. # 8. Set some parameters to control loop
  36. # epoch
  37. epoch = 20
  38. iter = 0
  39. t0 = time.time()
  40. for i in range(epoch):
  41. t1 = time.time()
  42. print(" -----------------the {} number of training epoch --------------".format(i))
  43. model.train()
  44. for data in train_dataloader:
  45. imgs, targets = data
  46. if torch.cuda.is_available():
  47. cross_entropy_loss = cross_entropy_loss.cuda()
  48. imgs, targets = imgs.cuda(), targets.cuda()
  49. outputs = model(imgs)
  50. loss_train = cross_entropy_loss(outputs, targets)
  51. writer.add_scalar("train_loss", loss_train.item(), iter)
  52. optim.zero_grad()
  53. loss_train.backward()
  54. optim.step()
  55. iter = iter + 1
  56. if iter % 100 == 0:
  57. print(
  58. "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
  59. .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
  60. np.mean(loss_train.item())))
  61. writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
  62. scheduler.step()
  63. t2 = time.time()
  64. h = (t2 - t1) // 3600
  65. m = ((t2 - t1) % 3600) // 60
  66. s = ((t2 - t1) % 3600) % 60
  67. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  68. if i % 1 == 0:
  69. print("Save state, iter: {} ".format(i))
  70. torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))
  71. torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
  72. t3 = time.time()
  73. h_t = (t3 - t0) // 3600
  74. m_t = ((t3 - t0) % 3600) // 60
  75. s_t = ((t3 - t0) % 3600) // 60
  76. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  77. writer.close()

运行结果:

b51e582505c8413fa7537378177bd171.png

 16c5c3b5d5194b2aaf339ccbb9d698db.png

aff39bc8a6dc4460b818e4ed0187587b.png c3d77cbfcd424bad97f8f184cb56d9c5.png

4320bd3f555a49ef95bc57394c86bead.png

可视化lr与loss:

172db2426d154a9697024df6697a3fed.png

 a8b70675e76e4f68a80c9c72d9e164e1.png

分析:

  1. new_lr=lr_lambda(epoch) * initial_lr
  2. lambda1 = lambda epoch: epoch // 2
  3. LambdaLR(optimizer, lr_lambda=lambda1, last_epoch=-1)
  4. 当epoch=0时,new_lr = (0 // 2) * 0.001 = 0 * 0.001 = 0
  5. 当epoch=1时,new_lr = (1 // 2) * 0.001 = 0 * 0.001 = 0
  6. 当epoch=2时,new_lr = (2 // 2) * 0.001 = 1 * 0.001 = 0.001
  7. 当epoch=3时,new_lr = (3 // 2) * 0.001 = 1 * 0.001 = 0.001
  8. 当epoch=4时,new_lr = (4 // 2) * 0.001 = 2 * 0.001 = 0.002
  9. 当epoch=5时,new_lr = (5 // 2) * 0.001 = 2 * 0.001 = 0.002
  10. 当epoch=6时,new_lr = (6 // 2) * 0.001 = 3 * 0.001 = 0.003
  11. 当epoch=7时,new_lr = (7 // 2) * 0.001 = 3 * 0.001 = 0.003
  12. 当epoch=8时,new_lr = (8 // 2) * 0.001 = 4 * 0.001 = 0.004
  13. 当epoch=9时,new_lr = (9 // 2) * 0.001 = 4 * 0.001 = 0.004
  14. 当epoch=10时,new_lr = (10 // 2) * 0.001 = 5 * 0.001 = 0.005
  15. 当epoch=11时,new_lr = (11 // 2) * 0.001 = 5 * 0.001 = 0.005
  16. 当epoch=12时,new_lr = (12 // 2) * 0.001 = 6 * 0.001 = 0.006
  17. 当epoch=13时,new_lr = (13 // 2) * 0.001 = 6 * 0.001 = 0.006
  18. 当epoch=14时,new_lr = (14 // 2) * 0.001 = 7 * 0.001 = 0.007
  19. 当epoch=15时,new_lr = (15 // 2) * 0.001 = 7 * 0.001 = 0.007
  20. 当epoch=16时,new_lr = (16 // 2) * 0.001 = 8 * 0.001 = 0.008
  21. 当epoch=17时,new_lr = (17 // 2) * 0.001 = 8 * 0.001 = 0.008
  22. 当epoch=18时,new_lr = (18 // 2) * 0.001 = 9 * 0.001 = 0.009
  23. 当epoch=19时,new_lr = (19 // 2) * 0.001 = 9 * 0.001 = 0.009

5.自适应调整学习率:ReduceLROnPlateau()

该策略能够读取模型的性能指标,当该指标停止改善时,持续关注(patience)几个epochs之后,自动减小学习率。

  1. scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=3, verbose=True)
  2. scheduler.step(np.mean(loss))

 其参数:

  1. def __init__(self, optimizer, mode='min', factor=0.1, patience=10,
  2. threshold=1e-4, threshold_mode='rel', cooldown=0,
  3. min_lr=0, eps=1e-8, verbose=False):

  1. optimizer: 需优化的变量
  2. mode: min表示指标不再减小(loss)时降低学习率,max表示指标不再增加(accuracy)时降低学习率
  3. factor: 学习率改变的因子new_lr = lr * factor, 默认情况下为0.1
  4. patience: 观察几个epoch之后降低学习率,默认情况下10个epoch降低学习率
  5. threshold: 只关注超过阈值的显著变化,默认情况下为1e-4
  6. threshold_mode: 有rel和abs两种阈值计算模式;
  7. rel规则:max模式下如果超过best(1+threshold)为显著,min模式下如果低于best(1-threshold)为显著;
  8. abs规则:max模式下如果超过best+threshold为显著,min模式下如果低于best-threshold为显著
  9. cooldown: 触发一次条件后,等待一定epoch再进行检测,避免lr下降过速,默认情况下为0
  10. min_lr=0: 学习率的下限,默认情况下为0
  11. eps=1e-8: 新旧学习率之间的差异小于eps,则忽略更新,默认值情况下为1e-8
  12. verbose=False: 是否打印
  13. 例如
  14. ReduceLROnPlateau(optim, patience=3, verbose=True)
  15. loss停止改善时,持续关注(patience)3个epochs之后,自动减小学习率

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. # 1.Create SummaryWriter
  11. writer = SummaryWriter("../tensorboard")
  12. # 2.Ready dataset
  13. train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
  14. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  15. print('CUDA available: {}'.format(torch.cuda.is_available()))
  16. # 3.Length
  17. train_dataset_size = len(train_dataset)
  18. print("the train dataset size is {}".format(train_dataset_size))
  19. # 4.DataLoader
  20. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  21. # 5.Create model
  22. model = AlexNet()
  23. if torch.cuda.is_available():
  24. model = model.cuda()
  25. model = torch.nn.DataParallel(model).cuda()
  26. else:
  27. model = torch.nn.DataParallel(model)
  28. # 6.Create loss
  29. cross_entropy_loss = nn.CrossEntropyLoss()
  30. # 7.Optimizer
  31. learning_rate = 1e-3
  32. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  33. scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=3, verbose=True)
  34. # 8. Set some parameters to control loop
  35. # epoch
  36. epoch = 20
  37. iter = 0
  38. t0 = time.time()
  39. for i in range(epoch):
  40. t1 = time.time()
  41. print(" -----------------the {} number of training epoch --------------".format(i))
  42. model.train()
  43. for data in train_dataloader:
  44. loss = 0
  45. imgs, targets = data
  46. if torch.cuda.is_available():
  47. cross_entropy_loss = cross_entropy_loss.cuda()
  48. imgs, targets = imgs.cuda(), targets.cuda()
  49. outputs = model(imgs)
  50. loss_train = cross_entropy_loss(outputs, targets)
  51. loss = loss_train.item() + loss
  52. writer.add_scalar("train_loss", loss_train.item(), iter)
  53. optim.zero_grad()
  54. loss_train.backward()
  55. optim.step()
  56. iter = iter + 1
  57. if iter % 100 == 0:
  58. print(
  59. "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
  60. .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
  61. np.mean(loss)))
  62. writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
  63. scheduler.step(np.mean(loss))
  64. t2 = time.time()
  65. h = (t2 - t1) // 3600
  66. m = ((t2 - t1) % 3600) // 60
  67. s = ((t2 - t1) % 3600) % 60
  68. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  69. if i % 1 == 0:
  70. print("Save state, iter: {} ".format(i))
  71. torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))
  72. torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
  73. t3 = time.time()
  74. h_t = (t3 - t0) // 3600
  75. m_t = ((t3 - t0) % 3600) // 60
  76. s_t = ((t3 - t0) % 3600) // 60
  77. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  78. writer.close()

运行结果:

efa97d6a1d9d41be803f62d0077fd15c.png

9d12f97cf4be49f49265732caadb0188.png

f6c830570d1243e9825f98768f8e2c25.png

可视化lr与loss: lr(看橙色透明线条):

96cdffeb58dc4ceb8bd5894aaa8fc166.png

b3ed614efd314ddba86ba8115ada959a.png

 6.指数式调整学习率:ExponentialLR()

scheduler = torch.optim.lr_scheduler.ExponentialLR(optim, gamma=0.2)

 其参数:

def __init__(self, optimizer, gamma, last_epoch=-1, verbose=False):

  1. optimizer: 需优化的变量
  2. gamma: 学习速率衰减的乘法因子
  3. last_epoch=-1: 最好别修改
  4. verbose=False: 是否打印
  5. 例如
  6. lr=1e-3,ExponentialLR(optim, gamma=0.2)
  7. new_lr = lr * gamma^(epoch)
  8. 当epoch=0时,new_lr = 0.001 * 0.2^0 = 0.001
  9. 当epoch=1时,new_lr = 0.001 * 0.2^1 = 0.0002
  10. 当epoch=2时,new_lr = 0.001 * 0.2^2 = 4e-5
  11. ...

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. # 1.Create SummaryWriter
  11. writer = SummaryWriter("../tensorboard")
  12. # 2.Ready dataset
  13. train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
  14. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  15. print('CUDA available: {}'.format(torch.cuda.is_available()))
  16. # 3.Length
  17. train_dataset_size = len(train_dataset)
  18. print("the train dataset size is {}".format(train_dataset_size))
  19. # 4.DataLoader
  20. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  21. # 5.Create model
  22. model = AlexNet()
  23. if torch.cuda.is_available():
  24. model = model.cuda()
  25. model = torch.nn.DataParallel(model).cuda()
  26. else:
  27. model = torch.nn.DataParallel(model)
  28. # 6.Create loss
  29. cross_entropy_loss = nn.CrossEntropyLoss()
  30. # 7.Optimizer
  31. learning_rate = 1e-3
  32. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  33. scheduler = torch.optim.lr_scheduler.ExponentialLR(optim, gamma=0.2)
  34. # 8. Set some parameters to control loop
  35. # epoch
  36. epoch = 20
  37. iter = 0
  38. t0 = time.time()
  39. for i in range(epoch):
  40. t1 = time.time()
  41. print(" -----------------the {} number of training epoch --------------".format(i))
  42. model.train()
  43. for data in train_dataloader:
  44. imgs, targets = data
  45. if torch.cuda.is_available():
  46. cross_entropy_loss = cross_entropy_loss.cuda()
  47. imgs, targets = imgs.cuda(), targets.cuda()
  48. outputs = model(imgs)
  49. loss_train = cross_entropy_loss(outputs, targets)
  50. writer.add_scalar("train_loss", loss_train.item(), iter)
  51. optim.zero_grad()
  52. loss_train.backward()
  53. optim.step()
  54. iter = iter + 1
  55. if iter % 100 == 0:
  56. print(
  57. "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
  58. .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
  59. np.mean(loss_train.item())))
  60. writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
  61. scheduler.step()
  62. t2 = time.time()
  63. h = (t2 - t1) // 3600
  64. m = ((t2 - t1) % 3600) // 60
  65. s = ((t2 - t1) % 3600) % 60
  66. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  67. if i % 1 == 0:
  68. print("Save state, iter: {} ".format(i))
  69. torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))
  70. torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
  71. t3 = time.time()
  72. h_t = (t3 - t0) // 3600
  73. m_t = ((t3 - t0) % 3600) // 60
  74. s_t = ((t3 - t0) % 3600) // 60
  75. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  76. writer.close()

运行结果:

e3cce468d7e34a5ea90ee35ca4c30fa4.png

da65217e177c4f6789443ffed59effd1.png

a65f6f31c7cc435e9b91789e800153f6.png 0e2431668d744531883b609f1065985c.png

可视化lr与loss: lr(看橙色透明线条): 

5478abb3e4be42fcadf10c099ed5f5d3.png

26ac6a8d86ee4a46b94d4ed4de58ec1f.png

分析: 

  1. new_lr = lr * gamma^(epoch)
  2. 当epoch=0时,new_lr = 0.001 * 0.2^0 = 0.001
  3. 当epoch=1时,new_lr = 0.001 * 0.2^1 = 0.0002
  4. 当epoch=2时,new_lr = 0.001 * 0.2^2 = 4e-5
  5. 当epoch=3时,new_lr = 0.001 * 0.2^3 = 8e-6
  6. 当epoch=4时,new_lr = 0.001 * 0.2^4 = 1.6e-6
  7. 当epoch=5时,new_lr = 0.001 * 0.2^5 = 3.2e-7
  8. 当epoch=6时,new_lr = 0.001 * 0.2^6 = 6.4e-8
  9. 当epoch=7时,new_lr = 0.001 * 0.2^7 = 1.28e-8
  10. 当epoch=8时,new_lr = 0.001 * 0.2^8 = 2.56e-9
  11. 当epoch=9时,new_lr = 0.001 * 0.2^9 = 5.12e-10
  12. 当epoch=10时,new_lr = 0.001 * 0.2^10 = 1.024e-10
  13. 当epoch=11时,new_lr = 0.001 * 0.2^11 = 2.048e-11
  14. 当epoch=12时,new_lr = 0.001 * 0.2^12 = 4.096e-12
  15. 当epoch=13时,new_lr = 0.001 * 0.2^13 = 8.192e-13
  16. 当epoch=14时,new_lr = 0.001 * 0.2^14 = 1.6384e-13
  17. 当epoch=15时,new_lr = 0.001 * 0.2^15 = 3.2768e-14
  18. 当epoch=16时,new_lr = 0.001 * 0.2^16 = 6.5536e-15
  19. 当epoch=17时,new_lr = 0.001 * 0.2^17 = 1.31072e-15
  20. 当epoch=18时,new_lr = 0.001 * 0.2^18 = 2.62144e-16
  21. 当epoch=19时,new_lr = 0.001 * 0.2^19 = 5.24288e-17

 7. 余弦退火调整学习率:CosineAnnealingLR()

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, T_max=5)

其参数:

def __init__(self, optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False):

  1. 余弦退火学习率中LR的变化是周期性的
  2. optimizer: 需优化的变量
  3. T_max(int): 周期的1/2,一次学习率周期的迭代次数,即 T_max 个 epoch 之后重新设置学习率
  4. eta_min(float): 最小学习率,即在一个周期中,学习率最小会下降到 eta_min,默认情况下为0
  5. last_epoch=-1: 上一个epoch数,该变量表示学习率是否需要调整, 最好别修改
  6. verbose=False: 是否打印
  7. 例如
  8. CosineAnnealingLR(optim, T_max=5)
  9. new_lr = eta_min + 0.5 * (initial_lr - eta_min) * (1 + cos(epoch / T_max * Π))
  10. eta_min为最小学习率,T_max为cos周期的1/2

代码:

  1. from torch.utils.data import DataLoader
  2. from torchvision.models import AlexNet
  3. from torchvision import transforms
  4. import torchvision
  5. import torch
  6. from torch import nn
  7. from torch.utils.tensorboard import SummaryWriter
  8. import time
  9. import numpy as np
  10. # 1.Create SummaryWriter
  11. writer = SummaryWriter("../tensorboard")
  12. # 2.Ready dataset
  13. train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
  14. [transforms.Resize(227), transforms.ToTensor()]), download=True)
  15. print('CUDA available: {}'.format(torch.cuda.is_available()))
  16. # 3.Length
  17. train_dataset_size = len(train_dataset)
  18. print("the train dataset size is {}".format(train_dataset_size))
  19. # 4.DataLoader
  20. train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)
  21. # 5.Create model
  22. model = AlexNet()
  23. if torch.cuda.is_available():
  24. model = model.cuda()
  25. model = torch.nn.DataParallel(model).cuda()
  26. else:
  27. model = torch.nn.DataParallel(model)
  28. # 6.Create loss
  29. cross_entropy_loss = nn.CrossEntropyLoss()
  30. # 7.Optimizer
  31. learning_rate = 1e-3
  32. optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
  33. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, T_max=5)
  34. # 8. Set some parameters to control loop
  35. # epoch
  36. epoch = 20
  37. iter = 0
  38. t0 = time.time()
  39. for i in range(epoch):
  40. t1 = time.time()
  41. print(" -----------------the {} number of training epoch --------------".format(i))
  42. model.train()
  43. for data in train_dataloader:
  44. imgs, targets = data
  45. if torch.cuda.is_available():
  46. cross_entropy_loss = cross_entropy_loss.cuda()
  47. imgs, targets = imgs.cuda(), targets.cuda()
  48. outputs = model(imgs)
  49. loss_train = cross_entropy_loss(outputs, targets)
  50. writer.add_scalar("train_loss", loss_train.item(), iter)
  51. optim.zero_grad()
  52. loss_train.backward()
  53. optim.step()
  54. iter = iter + 1
  55. if iter % 100 == 0:
  56. print(
  57. "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
  58. .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
  59. np.mean(loss_train.item())))
  60. writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
  61. scheduler.step()
  62. t2 = time.time()
  63. h = (t2 - t1) // 3600
  64. m = ((t2 - t1) % 3600) // 60
  65. s = ((t2 - t1) % 3600) % 60
  66. print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))
  67. if i % 1 == 0:
  68. print("Save state, iter: {} ".format(i))
  69. torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))
  70. torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
  71. t3 = time.time()
  72. h_t = (t3 - t0) // 3600
  73. m_t = ((t3 - t0) % 3600) // 60
  74. s_t = ((t3 - t0) % 3600) // 60
  75. print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
  76. writer.close()

运行结果:

6486638adee4409cbac0267310a3383c.png

21f77650ffdc4409a009a2486e8ff789.png adf3a71aedb8414eb1a9eda9a6a8773e.png

f67ba451fe5e409e90ca963db70acbc4.png

可视化lr与loss:

3543ad4fd5474cb588ad4d2faba8b110.png

 0e5545536a8049a08cb7654765844906.png

代码已上传至github:

https://github.com/HanXiaoyiGitHub/LrAdjustmentMechanism-Pytorch-mastericon-default.png?t=M4ADhttps://github.com/HanXiaoyiGitHub/LrAdjustmentMechanism-Pytorch-master

5110fff43ed54cffb45246cd9da66d62.png

收工!

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号