当前位置:   article > 正文

训练模型的3种方法

训练模型

公众号后台回复关键字:Pytorch,获取项目github地址。

Pytorch没有官方的高阶API。一般通过nn.Module来构建模型并编写自定义训练循环。

为了更加方便地训练模型,作者编写了仿keras的Pytorch模型接口:torchkeras, 作为Pytorch的高阶API。

本章我们主要详细介绍Pytorch的高阶API如下相关的内容。

  • 构建模型的3种方法(继承nn.Module基类,使用nn.Sequential,辅助应用模型容器)

  • 训练模型的3种方法(脚本风格,函数风格,torchkeras.Model类风格)

  • 使用GPU训练模型(单GPU训练,多GPU训练)

本篇我们介绍训练模型的3种方法。

pytorch通常需要用户编写自定义训练循环,训练循环的代码风格因人而异。

有3类典型的训练循环代码风格:脚本形式训练循环,函数形式训练循环,类形式训练循环。

下面以minist数据集的分类模型的训练为例,演示这3种训练模型的风格。

〇,准备数据

  1. import torch 
  2. from torch import nn 
  3. from torchkeras import summary,Model 
  4. import torchvision 
  5. from torchvision import transforms
  1. transform = transforms.Compose([transforms.ToTensor()])
  2. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  3. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  4. dl_train =  torch.utils.data.DataLoader(ds_train, batch_size=128, shuffle=True, num_workers=4)
  5. dl_valid =  torch.utils.data.DataLoader(ds_valid, batch_size=128, shuffle=False, num_workers=4)
  6. print(len(ds_train))
  7. print(len(ds_valid))
  1. 60000
  2. 10000
  1. %matplotlib inline
  2. %config InlineBackend.figure_format = 'svg'
  3. #查看部分样本
  4. from matplotlib import pyplot as plt 
  5. plt.figure(figsize=(8,8)) 
  6. for i in range(9):
  7.     img,label = ds_train[i]
  8.     img = torch.squeeze(img)
  9.     ax=plt.subplot(3,3,i+1)
  10.     ax.imshow(img.numpy())
  11.     ax.set_title("label = %d"%label)
  12.     ax.set_xticks([])
  13.     ax.set_yticks([]) 
  14. plt.show()

一,脚本风格

脚本风格的训练循环最为常见。

  1. net = nn.Sequential()
  2. net.add_module("conv1",nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3))
  3. net.add_module("pool1",nn.MaxPool2d(kernel_size = 2,stride = 2))
  4. net.add_module("conv2",nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5))
  5. net.add_module("pool2",nn.MaxPool2d(kernel_size = 2,stride = 2))
  6. net.add_module("dropout",nn.Dropout2d(p = 0.1))
  7. net.add_module("adaptive_pool",nn.AdaptiveMaxPool2d((1,1)))
  8. net.add_module("flatten",nn.Flatten())
  9. net.add_module("linear1",nn.Linear(64,32))
  10. net.add_module("relu",nn.ReLU())
  11. net.add_module("linear2",nn.Linear(32,10))
  12. print(net)
  1. Sequential(
  2.   (conv1): Conv2d(132, kernel_size=(33), stride=(11))
  3.   (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  4.   (conv2): Conv2d(3264, kernel_size=(55), stride=(11))
  5.   (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  6.   (dropout): Dropout2d(p=0.1, inplace=False)
  7.   (adaptive_pool): AdaptiveMaxPool2d(output_size=(11))
  8.   (flatten): Flatten()
  9.   (linear1): Linear(in_features=64, out_features=32, bias=True)
  10.   (relu): ReLU()
  11.   (linear2): Linear(in_features=32, out_features=10, bias=True)
  12. )
summary(net,input_shape=(1,32,32))
  1. ----------------------------------------------------------------
  2.         Layer (type)               Output Shape         Param #
  3. ================================================================
  4.             Conv2d-1           [-1323030]             320
  5.          MaxPool2d-2           [-1321515]               0
  6.             Conv2d-3           [-1641111]          51,264
  7.          MaxPool2d-4             [-16455]               0
  8.          Dropout2d-5             [-16455]               0
  9.  AdaptiveMaxPool2d-6             [-16411]               0
  10.            Flatten-7                   [-164]               0
  11.             Linear-8                   [-132]           2,080
  12.               ReLU-9                   [-132]               0
  13.            Linear-10                   [-110]             330
  14. ================================================================
  15. Total params: 53,994
  16. Trainable params: 53,994
  17. Non-trainable params: 0
  18. ----------------------------------------------------------------
  19. Input size (MB): 0.003906
  20. Forward/backward pass size (MB): 0.359695
  21. Params size (MB): 0.205971
  22. Estimated Total Size (MB): 0.569572
  23. ----------------------------------------------------------------
  1. import datetime
  2. import numpy as np 
  3. import pandas as pd 
  4. from sklearn.metrics import accuracy_score
  5. def accuracy(y_pred,y_true):
  6.     y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  7.     return accuracy_score(y_true,y_pred_cls)
  8. loss_func = nn.CrossEntropyLoss()
  9. optimizer = torch.optim.Adam(params=net.parameters(),lr = 0.01)
  10. metric_func = accuracy
  11. metric_name = "accuracy"
  1. epochs = 3
  2. log_step_freq = 100
  3. dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name]) 
  4. print("Start Training...")
  5. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  6. print("=========="*8 + "%s"%nowtime)
  7. for epoch in range(1,epochs+1):  
  8.     # 1,训练循环-------------------------------------------------
  9.     net.train()
  10.     loss_sum = 0.0
  11.     metric_sum = 0.0
  12.     step = 1
  13.     
  14.     for step, (features,labels) in enumerate(dl_train, 1):
  15.     
  16.         # 梯度清零
  17.         optimizer.zero_grad()
  18.         # 正向传播求损失
  19.         predictions = net(features)
  20.         loss = loss_func(predictions,labels)
  21.         metric = metric_func(predictions,labels)
  22.         
  23.         # 反向传播求梯度
  24.         loss.backward()
  25.         optimizer.step()
  26.         # 打印batch级别日志
  27.         loss_sum += loss.item()
  28.         metric_sum += metric.item()
  29.         if step%log_step_freq == 0:   
  30.             print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") %
  31.                   (step, loss_sum/step, metric_sum/step))
  32.             
  33.     # 2,验证循环-------------------------------------------------
  34.     net.eval()
  35.     val_loss_sum = 0.0
  36.     val_metric_sum = 0.0
  37.     val_step = 1
  38.     for val_step, (features,labels) in enumerate(dl_valid, 1):
  39.         
  40.         predictions = net(features)
  41.         val_loss = loss_func(predictions,labels)
  42.         val_metric = metric_func(predictions,labels)
  43.         val_loss_sum += val_loss.item()
  44.         val_metric_sum += val_metric.item()
  45.     # 3,记录日志-------------------------------------------------
  46.     info = (epoch, loss_sum/step, metric_sum/step, 
  47.             val_loss_sum/val_step, val_metric_sum/val_step)
  48.     dfhistory.loc[epoch-1] = info
  49.     
  50.     # 打印epoch级别日志
  51.     print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
  52.           "  = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f"
  53.           %info)
  54.     nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  55.     print("\n"+"=========="*8 + "%s"%nowtime)
  56.         
  57. print('Finished Training...')
  1. Start Training...
  2. ================================================================================2020-06-26 12:49:16
  3. [step = 100] loss: 0.742, accuracy: 0.745
  4. [step = 200] loss: 0.466, accuracy: 0.843
  5. [step = 300] loss: 0.363, accuracy: 0.880
  6. [step = 400] loss: 0.310, accuracy: 0.898
  7. EPOCH = 1, loss = 0.281,accuracy  = 0.908, val_loss = 0.087, val_accuracy = 0.972
  8. ================================================================================2020-06-26 12:50:32
  9. [step = 100] loss: 0.103, accuracy: 0.970
  10. [step = 200] loss: 0.114, accuracy: 0.966
  11. [step = 300] loss: 0.112, accuracy: 0.967
  12. [step = 400] loss: 0.108, accuracy: 0.968
  13. EPOCH = 2, loss = 0.111,accuracy  = 0.967, val_loss = 0.082, val_accuracy = 0.976
  14. ================================================================================2020-06-26 12:51:47
  15. [step = 100] loss: 0.093, accuracy: 0.972
  16. [step = 200] loss: 0.095, accuracy: 0.971
  17. [step = 300] loss: 0.092, accuracy: 0.972
  18. [step = 400] loss: 0.093, accuracy: 0.972
  19. EPOCH = 3, loss = 0.098,accuracy  = 0.971, val_loss = 0.113, val_accuracy = 0.970
  20. ================================================================================2020-06-26 12:53:09
  21. Finished Training...

二,函数风格

该风格在脚本形式上作了简单的函数封装。

  1. class Net(nn.Module):
  2.     def __init__(self):
  3.         super(Net, self).__init__()
  4.         self.layers = nn.ModuleList([
  5.             nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  6.             nn.MaxPool2d(kernel_size = 2,stride = 2),
  7.             nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  8.             nn.MaxPool2d(kernel_size = 2,stride = 2),
  9.             nn.Dropout2d(p = 0.1),
  10.             nn.AdaptiveMaxPool2d((1,1)),
  11.             nn.Flatten(),
  12.             nn.Linear(64,32),
  13.             nn.ReLU(),
  14.             nn.Linear(32,10)]
  15.         )
  16.     def forward(self,x):
  17.         for layer in self.layers:
  18.             x = layer(x)
  19.         return x
  20. net = Net()
  21. print(net)
  1. Net(
  2.   (layers): ModuleList(
  3.     (0): Conv2d(132, kernel_size=(33), stride=(11))
  4.     (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  5.     (2): Conv2d(3264, kernel_size=(55), stride=(11))
  6.     (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  7.     (4): Dropout2d(p=0.1, inplace=False)
  8.     (5): AdaptiveMaxPool2d(output_size=(11))
  9.     (6): Flatten()
  10.     (7): Linear(in_features=64, out_features=32, bias=True)
  11.     (8): ReLU()
  12.     (9): Linear(in_features=32, out_features=10, bias=True)
  13.   )
  14. )
summary(net,input_shape=(1,32,32))
  1. ----------------------------------------------------------------
  2.         Layer (type)               Output Shape         Param #
  3. ================================================================
  4.             Conv2d-1           [-1323030]             320
  5.          MaxPool2d-2           [-1321515]               0
  6.             Conv2d-3           [-1641111]          51,264
  7.          MaxPool2d-4             [-16455]               0
  8.          Dropout2d-5             [-16455]               0
  9.  AdaptiveMaxPool2d-6             [-16411]               0
  10.            Flatten-7                   [-164]               0
  11.             Linear-8                   [-132]           2,080
  12.               ReLU-9                   [-132]               0
  13.            Linear-10                   [-110]             330
  14. ================================================================
  15. Total params: 53,994
  16. Trainable params: 53,994
  17. Non-trainable params: 0
  18. ----------------------------------------------------------------
  19. Input size (MB): 0.003906
  20. Forward/backward pass size (MB): 0.359695
  21. Params size (MB): 0.205971
  22. Estimated Total Size (MB): 0.569572
  23. ----------------------------------------------------------------
  1. import datetime
  2. import numpy as np 
  3. import pandas as pd 
  4. from sklearn.metrics import accuracy_score
  5. def accuracy(y_pred,y_true):
  6.     y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  7.     return accuracy_score(y_true,y_pred_cls)
  8. model = net
  9. model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)
  10. model.loss_func = nn.CrossEntropyLoss()
  11. model.metric_func = accuracy
  12. model.metric_name = "accuracy"
  1. def train_step(model,features,labels):
  2.     
  3.     # 训练模式,dropout层发生作用
  4.     model.train()
  5.     
  6.     # 梯度清零
  7.     model.optimizer.zero_grad()
  8.     
  9.     # 正向传播求损失
  10.     predictions = model(features)
  11.     loss = model.loss_func(predictions,labels)
  12.     metric = model.metric_func(predictions,labels)
  13.     # 反向传播求梯度
  14.     loss.backward()
  15.     model.optimizer.step()
  16.     return loss.item(),metric.item()
  17. def valid_step(model,features,labels):
  18.     
  19.     # 预测模式,dropout层不发生作用
  20.     model.eval()
  21.     
  22.     predictions = model(features)
  23.     loss = model.loss_func(predictions,labels)
  24.     metric = model.metric_func(predictions,labels)
  25.     
  26.     return loss.item(), metric.item()
  27. # 测试train_step效果
  28. features,labels = next(iter(dl_train))
  29. train_step(model,features,labels)
(2.327411174774170.1015625)
  1. def train_model(model,epochs,dl_train,dl_valid,log_step_freq):
  2.     metric_name = model.metric_name
  3.     dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name]) 
  4.     print("Start Training...")
  5.     nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  6.     print("=========="*8 + "%s"%nowtime)
  7.     for epoch in range(1,epochs+1):  
  8.         # 1,训练循环-------------------------------------------------
  9.         loss_sum = 0.0
  10.         metric_sum = 0.0
  11.         step = 1
  12.         for step, (features,labels) in enumerate(dl_train, 1):
  13.             loss,metric = train_step(model,features,labels)
  14.             # 打印batch级别日志
  15.             loss_sum += loss
  16.             metric_sum += metric
  17.             if step%log_step_freq == 0:   
  18.                 print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") %
  19.                       (step, loss_sum/step, metric_sum/step))
  20.         # 2,验证循环-------------------------------------------------
  21.         val_loss_sum = 0.0
  22.         val_metric_sum = 0.0
  23.         val_step = 1
  24.         for val_step, (features,labels) in enumerate(dl_valid, 1):
  25.             val_loss,val_metric = valid_step(model,features,labels)
  26.             val_loss_sum += val_loss
  27.             val_metric_sum += val_metric
  28.         # 3,记录日志-------------------------------------------------
  29.         info = (epoch, loss_sum/step, metric_sum/step, 
  30.                 val_loss_sum/val_step, val_metric_sum/val_step)
  31.         dfhistory.loc[epoch-1] = info
  32.         # 打印epoch级别日志
  33.         print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
  34.               "  = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f"
  35.               %info)
  36.         nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  37.         print("\n"+"=========="*8 + "%s"%nowtime)
  38.     print('Finished Training...')
  39.     return dfhistory
  1. epochs = 3
  2. dfhistory = train_model(model,epochs,dl_train,dl_valid,log_step_freq = 100)
  1. Start Training...
  2. ================================================================================2020-06-26 13:10:00
  3. [step = 100] loss: 2.298, accuracy: 0.137
  4. [step = 200] loss: 2.288, accuracy: 0.145
  5. [step = 300] loss: 2.278, accuracy: 0.165
  6. [step = 400] loss: 2.265, accuracy: 0.183
  7. EPOCH = 1, loss = 2.254,accuracy  = 0.195, val_loss = 2.158, val_accuracy = 0.301
  8. ================================================================================2020-06-26 13:11:23
  9. [step = 100] loss: 2.127, accuracy: 0.302
  10. [step = 200] loss: 2.080, accuracy: 0.338
  11. [step = 300] loss: 2.025, accuracy: 0.374
  12. [step = 400] loss: 1.957, accuracy: 0.411
  13. EPOCH = 2, loss = 1.905,accuracy  = 0.435, val_loss = 1.469, val_accuracy = 0.710
  14. ================================================================================2020-06-26 13:12:43
  15. [step = 100] loss: 1.435, accuracy: 0.615
  16. [step = 200] loss: 1.324, accuracy: 0.647
  17. [step = 300] loss: 1.221, accuracy: 0.672
  18. [step = 400] loss: 1.132, accuracy: 0.696
  19. EPOCH = 3, loss = 1.074,accuracy  = 0.711, val_loss = 0.582, val_accuracy = 0.878
  20. ================================================================================2020-06-26 13:13:59
  21. Finished Training...

三,类风格

此处使用torchkeras中定义的模型接口构建模型,并调用compile方法和fit方法训练模型。

使用该形式训练模型非常简洁明了。推荐使用该形式。

  1. class CnnModel(nn.Module):
  2.     def __init__(self):
  3.         super().__init__()
  4.         self.layers = nn.ModuleList([
  5.             nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  6.             nn.MaxPool2d(kernel_size = 2,stride = 2),
  7.             nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  8.             nn.MaxPool2d(kernel_size = 2,stride = 2),
  9.             nn.Dropout2d(p = 0.1),
  10.             nn.AdaptiveMaxPool2d((1,1)),
  11.             nn.Flatten(),
  12.             nn.Linear(64,32),
  13.             nn.ReLU(),
  14.             nn.Linear(32,10)]
  15.         )
  16.     def forward(self,x):
  17.         for layer in self.layers:
  18.             x = layer(x)
  19.         return x
  20. model = torchkeras.Model(CnnModel())
  21. print(model)
  1. CnnModel(
  2.   (layers): ModuleList(
  3.     (0): Conv2d(132, kernel_size=(33), stride=(11))
  4.     (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  5.     (2): Conv2d(3264, kernel_size=(55), stride=(11))
  6.     (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  7.     (4): Dropout2d(p=0.1, inplace=False)
  8.     (5): AdaptiveMaxPool2d(output_size=(11))
  9.     (6): Flatten()
  10.     (7): Linear(in_features=64, out_features=32, bias=True)
  11.     (8): ReLU()
  12.     (9): Linear(in_features=32, out_features=10, bias=True)
  13.   )
  14. )
model.summary(input_shape=(1,32,32))
  1. ----------------------------------------------------------------
  2.         Layer (type)               Output Shape         Param #
  3. ================================================================
  4.             Conv2d-1           [-1323030]             320
  5.          MaxPool2d-2           [-1321515]               0
  6.             Conv2d-3           [-1641111]          51,264
  7.          MaxPool2d-4             [-16455]               0
  8.          Dropout2d-5             [-16455]               0
  9.  AdaptiveMaxPool2d-6             [-16411]               0
  10.            Flatten-7                   [-164]               0
  11.             Linear-8                   [-132]           2,080
  12.               ReLU-9                   [-132]               0
  13.            Linear-10                   [-110]             330
  14. ================================================================
  15. Total params: 53,994
  16. Trainable params: 53,994
  17. Non-trainable params: 0
  18. ----------------------------------------------------------------
  19. Input size (MB): 0.003906
  20. Forward/backward pass size (MB): 0.359695
  21. Params size (MB): 0.205971
  22. Estimated Total Size (MB): 0.569572
  23. ----------------------------------------------------------------
  1. from sklearn.metrics import accuracy_score
  2. def accuracy(y_pred,y_true):
  3.     y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  4.     return accuracy_score(y_true.numpy(),y_pred_cls.numpy())
  5. model.compile(loss_func = nn.CrossEntropyLoss(),
  6.              optimizer= torch.optim.Adam(model.parameters(),lr = 0.02),
  7.              metrics_dict={"accuracy":accuracy})
  1. dfhistory = model.fit(3,dl_train = dl_train, dl_val=dl_valid, log_step_freq=100
  1. Start Training ...
  2. ================================================================================2020-06-26 13:22:39
  3. {'step'100'loss'0.976'accuracy'0.664}
  4. {'step'200'loss'0.611'accuracy'0.795}
  5. {'step'300'loss'0.478'accuracy'0.841}
  6. {'step'400'loss'0.403'accuracy'0.868}
  7.  +-------+-------+----------+----------+--------------+
  8. | epoch |  loss | accuracy | val_loss | val_accuracy |
  9. +-------+-------+----------+----------+--------------+
  10. |   1   | 0.371 |  0.879   |  0.087   |    0.972     |
  11. +-------+-------+----------+----------+--------------+
  12. ================================================================================2020-06-26 13:23:59
  13. {'step'100'loss'0.182'accuracy'0.948}
  14. {'step'200'loss'0.176'accuracy'0.949}
  15. {'step'300'loss'0.173'accuracy'0.95}
  16. {'step'400'loss'0.174'accuracy'0.951}
  17.  +-------+-------+----------+----------+--------------+
  18. | epoch |  loss | accuracy | val_loss | val_accuracy |
  19. +-------+-------+----------+----------+--------------+
  20. |   2   | 0.175 |  0.951   |  0.152   |    0.958     |
  21. +-------+-------+----------+----------+--------------+
  22. ================================================================================2020-06-26 13:25:22
  23. {'step'100'loss'0.143'accuracy'0.961}
  24. {'step'200'loss'0.151'accuracy'0.959}
  25. {'step'300'loss'0.149'accuracy'0.96}
  26. {'step'400'loss'0.152'accuracy'0.959}
  27.  +-------+-------+----------+----------+--------------+
  28. | epoch |  loss | accuracy | val_loss | val_accuracy |
  29. +-------+-------+----------+----------+--------------+
  30. |   3   | 0.153 |  0.959   |  0.086   |    0.975     |
  31. +-------+-------+----------+----------+--------------+
  32. ================================================================================2020-06-26 13:26:48
  33. Finished Training...

如果本书对你有所帮助,想鼓励一下作者,记得给本项目加一颗星星star⭐️,并分享给你的朋友们喔????!

如果对本书内容理解上有需要进一步和作者交流的地方,可以在公众号后台回复关键字:加群,加入读者交流群和大家讨论。

公众号后台回复关键字:pytorch,获取项目github地址。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Monodyee/article/detail/502017
推荐阅读
相关标签
  

闽ICP备14008679号