当前位置:   article > 正文

CV09_深度学习模块之间的缝合教学(4)--调参

CV09_深度学习模块之间的缝合教学(4)--调参

深度学习就像炼丹。炉子就是模型,火候就是那些参数,材料就是数据集。

1.1 参数有哪些

调参调参,参数到底是哪些参数?

1.网络相关的参数:(1)神经网络网络层

(2)隐藏层的神经元个数等

(3)卷积核的数量

(4)损失层函数的选择

2.数据预处理的相关参数:(1)batch normalization(2)等等

3.超参数:(1)激活函数(2)初始化(凯明初始化等)

(3)梯度下降(SGD、Adam)

(4)epoch (5)batch size

(6)学习率lr (7)衰减函数、正则化等

1.2 常见的情况及原因

1.通常是在网络训练的结果:(1)过拟合----样本数量太少了

(2)欠拟合----样本多但模型简单

(3)拟合,但是在上下浮动(震荡)

(4)恰好拟合

(5)模型不收敛

1.3 解决方法

(1)过拟合----数据增强、早停法、drop out、降低学习率、调整epoch

(2)欠拟合----加深层数、尽量用非线性的激活函数如relu

(3)拟合但震荡----降低数据增强的程度、学习率

(4)

(5)模型不收敛----数据集有问题、网络模型有问题

1.4 调参的过程

1.搭建网络模型

2.先用小样本进行尝试模型效果

3.根据小样本的效果,进行调参,包括分析损失。

1.5 代码(CPU训练)

  1. # Author:SiZhen
  2. # Create: 2024/7/14
  3. # Description: 调参练习-以手写数字集为例
  4. import torch
  5. import torch.nn as nn
  6. import torchvision
  7. import torch.utils
  8. import torchvision.transforms as transforms
  9. import matplotlib.pyplot as plt
  10. #设置超参数
  11. from torch import optim
  12. from torch.nn import init
  13. batch_size = 64
  14. hidden_size = 128
  15. learning_rate = 0.001
  16. num_epoch = 10
  17. #将图片进行预处理转换成pytorch张量
  18. transform = transforms.Compose([transforms.ToTensor(),
  19. transforms.Normalize((0.5,),(0.5,),)]) #mnist是单通道,所以括号内只有一个0.5
  20. #下载训练集
  21. train_set = torchvision.datasets.MNIST(root="./data",train=True,download=True,transform=transform)
  22. #下载测试集
  23. test_set = torchvision.datasets.MNIST(root="./data",train=False,download=True,transform=transform)
  24. #加载数据集
  25. train_loader= torch.utils.data.DataLoader(train_set,batch_size=batch_size,shuffle=True)
  26. test_loader = torch.utils.data.DataLoader(test_set,batch_size=batch_size,shuffle=False)#测试不shuffle
  27. input_size = 784 #mnist,28x28像素
  28. num_classes = 10
  29. class SEAttention(nn.Module):
  30. # 初始化SE模块,channel为通道数,reduction为降维比率
  31. def __init__(self, channel=1, reduction=8):
  32. super().__init__()
  33. self.avg_pool = nn.AdaptiveAvgPool2d(1) # 自适应平均池化层,将特征图的空间维度压缩为1x1
  34. self.fc = nn.Sequential( # 定义两个全连接层作为激励操作,通过降维和升维调整通道重要性
  35. nn.Linear(channel, channel // reduction, bias=False), # 降维,减少参数数量和计算量
  36. nn.ReLU(inplace=True), # ReLU激活函数,引入非线性
  37. nn.Linear(channel // reduction, channel, bias=False), # 升维,恢复到原始通道数
  38. nn.Sigmoid(), # Sigmoid激活函数,输出每个通道的重要性系数
  39. )
  40. # 权重初始化方法
  41. def init_weights(self):
  42. for m in self.modules(): # 遍历模块中的所有子模块
  43. if isinstance(m, nn.Conv2d): # 对于卷积层
  44. init.kaiming_normal_(m.weight, mode='fan_out') # 使用Kaiming初始化方法初始化权重
  45. if m.bias is not None:
  46. init.constant_(m.bias, 0) # 如果有偏置项,则初始化为0
  47. elif isinstance(m, nn.BatchNorm2d): # 对于批归一化层
  48. init.constant_(m.weight, 1) # 权重初始化为1
  49. init.constant_(m.bias, 0) # 偏置初始化为0
  50. elif isinstance(m, nn.Linear): # 对于全连接层
  51. init.normal_(m.weight, std=0.001) # 权重使用正态分布初始化
  52. if m.bias is not None:
  53. init.constant_(m.bias, 0) # 偏置初始化为0
  54. # 前向传播方法
  55. def forward(self, x):
  56. b, c, _, _ = x.size() # 获取输入x的批量大小b和通道数c
  57. y = self.avg_pool(x).view(b, c) # 通过自适应平均池化层后,调整形状以匹配全连接层的输入
  58. y = self.fc(y).view(b, c, 1, 1) # 通过全连接层计算通道重要性,调整形状以匹配原始特征图的形状
  59. return x * y.expand_as(x) # 将通道重要性系数应用到原始特征图上,进行特征重新校准
  60. import torch
  61. import torch.nn as nn
  62. from torch.nn import Softmax
  63. # 定义一个无限小的矩阵,用于在注意力矩阵中屏蔽特定位置
  64. def INF(B, H, W):
  65. return -torch.diag(torch.tensor(float("inf")).repeat(H), 0).unsqueeze(0).repeat(B * W, 1, 1)
  66. class CrissCrossAttention(nn.Module):
  67. """ Criss-Cross Attention Module"""
  68. def __init__(self, in_dim):
  69. super(CrissCrossAttention, self).__init__()
  70. # Q, K, V转换层
  71. self.query_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim // 8, kernel_size=1)
  72. self.key_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim // 8, kernel_size=1)
  73. self.value_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
  74. # 使用softmax对注意力分数进行归一化
  75. self.softmax = Softmax(dim=3)
  76. self.INF = INF
  77. # 学习一个缩放参数,用于调节注意力的影响
  78. self.gamma = nn.Parameter(torch.zeros(1))
  79. def forward(self, x):
  80. m_batchsize, _, height, width = x.size()
  81. # 计算查询(Q)、键(K)、值(V)矩阵
  82. proj_query = self.query_conv(x)
  83. proj_query_H = proj_query.permute(0, 3, 1, 2).contiguous().view(m_batchsize * width, -1, height).permute(0, 2, 1)
  84. proj_query_W = proj_query.permute(0, 2, 1, 3).contiguous().view(m_batchsize * height, -1, width).permute(0, 2, 1)
  85. proj_key = self.key_conv(x)
  86. proj_key_H = proj_key.permute(0, 3, 1, 2).contiguous().view(m_batchsize * width, -1, height)
  87. proj_key_W = proj_key.permute(0, 2, 1, 3).contiguous().view(m_batchsize * height, -1, width)
  88. proj_value = self.value_conv(x)
  89. proj_value_H = proj_value.permute(0, 3, 1, 2).contiguous().view(m_batchsize * width, -1, height)
  90. proj_value_W = proj_value.permute(0, 2, 1, 3).contiguous().view(m_batchsize * height, -1, width)
  91. # 计算垂直和水平方向上的注意力分数,并应用无穷小掩码屏蔽自注意
  92. energy_H = (torch.bmm(proj_query_H, proj_key_H) + self.INF(m_batchsize, height, width)).view(m_batchsize, width, height, height).permute(0, 2, 1, 3)
  93. energy_W = torch.bmm(proj_query_W, proj_key_W).view(m_batchsize, height, width, width)
  94. # 在垂直和水平方向上应用softmax归一化
  95. concate = self.softmax(torch.cat([energy_H, energy_W], 3))
  96. # 分离垂直和水平方向上的注意力,应用到值(V)矩阵上
  97. att_H = concate[:, :, :, 0:height].permute(0, 2, 1, 3).contiguous().view(m_batchsize * width, height, height)
  98. att_W = concate[:, :, :, height:height + width].contiguous().view(m_batchsize * height, width, width)
  99. # 计算最终的输出,加上输入x以应用残差连接
  100. out_H = torch.bmm(proj_value_H, att_H.permute(0, 2, 1)).view(m_batchsize, width, -1, height).permute(0, 2, 3, 1)
  101. out_W = torch.bmm(proj_value_W, att_W.permute(0, 2, 1)).view(m_batchsize, height, -1, width).permute(0, 2, 1, 3)
  102. return self.gamma * (out_H + out_W) + x
  103. class Net(nn.Module):
  104. def __init__(self, input_size, hidden_size, num_classes):
  105. super(Net, self).__init__()
  106. self.fc1 = nn.Linear(input_size, hidden_size)
  107. self.relu = nn.ReLU()
  108. self.fc2 = nn.Linear(hidden_size, num_classes)
  109. self.conv1 =nn.Conv2d(1,64,kernel_size=1)
  110. self.se = SEAttention(channel=1)
  111. self.cca = CrissCrossAttention(64)
  112. self.conv2 = nn.Conv2d(64,1,kernel_size=1)
  113. def forward(self, x):
  114. x = self.se(x)
  115. x = self.conv1(x)
  116. x = self.cca(x)
  117. x = self.conv2(x)
  118. out = self.fc1(x.view(-1, input_size))
  119. out = self.relu(out)
  120. out = self.fc2(out)
  121. return out
  122. model = Net(input_size,hidden_size,num_classes)
  123. criterion = nn.CrossEntropyLoss()
  124. optimizer = optim.Adam(model.parameters(),lr=learning_rate)
  125. train_loss_list = []
  126. test_loss_list = []
  127. #训练
  128. total_step = len(train_loader)
  129. for epoch in range(num_epoch):
  130. for i,(images,labels) in enumerate(train_loader):
  131. outputs = model(images) #获取模型分类后的结果
  132. loss = criterion(outputs,labels) #计算损失
  133. optimizer.zero_grad() #反向传播前,梯度清零
  134. loss.backward() #反向传播
  135. optimizer.step() #更新参数
  136. train_loss_list.append(loss.item())
  137. if (i+1)%100 ==0:
  138. print('Epoch[{}/{}],Step[{}/{}],Train Loss:{:.4f}'
  139. .format(epoch+1,num_epoch,i+1,total_step,loss.item()))
  140. model.eval()
  141. with torch.no_grad(): #禁止梯度计算
  142. test_loss = 0.0
  143. for images,labels in test_loader:
  144. outputs = model(images)
  145. loss = criterion(outputs,labels)
  146. test_loss +=loss.item()*images.size(0) #累加每个批次总损失得到总损失
  147. test_loss /=len(test_loader.dataset) #整个测试集的平均损失
  148. # 将计算得到的单个平均测试损失值扩展到一个列表中,长度为total_step
  149. # 这样做可能是为了在绘图时每个step都有一个对应的测试损失值,尽管实际测试损失在整个epoch内是恒定的
  150. test_loss_list.extend([test_loss]*total_step) #方便可视化
  151. model.train()
  152. print("Epoch[{}/{}],Test Loss:{:.4f}".format(epoch+1,num_epoch,test_loss))
  153. plt.plot(train_loss_list,label='Train Loss')
  154. plt.plot(test_loss_list,label='Test Loss')
  155. plt.title('model loss')
  156. plt.xlabel('iterations')
  157. plt.ylabel('Loss')
  158. plt.legend()
  159. plt.show()

1.6 代码(GPU训练)

要想模型在GPU上训练,需要两点:

(1)模型在GPU上

(2)所有参与运算的张量在GPU上

  1. # Author:SiZhen
  2. # Create: 2024/7/14
  3. # Description: 调参练习-以手写数字集为例
  4. import torch
  5. import torch.nn as nn
  6. import torchvision
  7. import torch.utils
  8. import torchvision.transforms as transforms
  9. import matplotlib.pyplot as plt
  10. #设置超参数
  11. from torch import optim
  12. from torch.nn import init
  13. device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
  14. batch_size = 64
  15. hidden_size = 128
  16. learning_rate = 0.001
  17. num_epoch = 10
  18. #将图片进行预处理转换成pytorch张量
  19. transform = transforms.Compose([transforms.ToTensor(),
  20. transforms.Normalize((0.5,),(0.5,),)]) #mnist是单通道,所以括号内只有一个0.5
  21. #下载训练集
  22. train_set = torchvision.datasets.MNIST(root="./data",train=True,download=True,transform=transform)
  23. #下载测试集
  24. test_set = torchvision.datasets.MNIST(root="./data",train=False,download=True,transform=transform)
  25. #加载数据集
  26. train_loader= torch.utils.data.DataLoader(train_set,batch_size=batch_size,shuffle=True,pin_memory=True)
  27. test_loader = torch.utils.data.DataLoader(test_set,batch_size=batch_size,shuffle=False,pin_memory=True)#测试不shuffle
  28. input_size = 784 #mnist,28x28像素
  29. num_classes = 10
  30. class SEAttention(nn.Module):
  31. # 初始化SE模块,channel为通道数,reduction为降维比率
  32. def __init__(self, channel=1, reduction=8):
  33. super().__init__()
  34. self.avg_pool = nn.AdaptiveAvgPool2d(1) # 自适应平均池化层,将特征图的空间维度压缩为1x1
  35. self.fc = nn.Sequential( # 定义两个全连接层作为激励操作,通过降维和升维调整通道重要性
  36. nn.Linear(channel, channel // reduction, bias=False), # 降维,减少参数数量和计算量
  37. nn.ReLU(inplace=True), # ReLU激活函数,引入非线性
  38. nn.Linear(channel // reduction, channel, bias=False), # 升维,恢复到原始通道数
  39. nn.Sigmoid(), # Sigmoid激活函数,输出每个通道的重要性系数
  40. )
  41. # 权重初始化方法
  42. def init_weights(self):
  43. for m in self.modules(): # 遍历模块中的所有子模块
  44. if isinstance(m, nn.Conv2d): # 对于卷积层
  45. init.kaiming_normal_(m.weight, mode='fan_out') # 使用Kaiming初始化方法初始化权重
  46. if m.bias is not None:
  47. init.constant_(m.bias, 0) # 如果有偏置项,则初始化为0
  48. elif isinstance(m, nn.BatchNorm2d): # 对于批归一化层
  49. init.constant_(m.weight, 1) # 权重初始化为1
  50. init.constant_(m.bias, 0) # 偏置初始化为0
  51. elif isinstance(m, nn.Linear): # 对于全连接层
  52. init.normal_(m.weight, std=0.001) # 权重使用正态分布初始化
  53. if m.bias is not None:
  54. init.constant_(m.bias, 0) # 偏置初始化为0
  55. # 前向传播方法
  56. def forward(self, x):
  57. b, c, _, _ = x.size() # 获取输入x的批量大小b和通道数c
  58. y = self.avg_pool(x).view(b, c) # 通过自适应平均池化层后,调整形状以匹配全连接层的输入
  59. y = self.fc(y).view(b, c, 1, 1) # 通过全连接层计算通道重要性,调整形状以匹配原始特征图的形状
  60. return x * y.expand_as(x) # 将通道重要性系数应用到原始特征图上,进行特征重新校准
  61. import torch
  62. import torch.nn as nn
  63. from torch.nn import Softmax
  64. # 定义一个无限小的矩阵,用于在注意力矩阵中屏蔽特定位置
  65. def INF(B, H, W):
  66. return -torch.diag(torch.tensor(float("inf")).repeat(H), 0).unsqueeze(0).repeat(B * W, 1, 1)
  67. class CrissCrossAttention(nn.Module):
  68. """ Criss-Cross Attention Module"""
  69. def __init__(self, in_dim):
  70. super(CrissCrossAttention, self).__init__()
  71. # Q, K, V转换层
  72. self.query_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim // 8, kernel_size=1)
  73. self.key_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim // 8, kernel_size=1)
  74. self.value_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
  75. # 使用softmax对注意力分数进行归一化
  76. self.softmax = Softmax(dim=3)
  77. self.INF = INF
  78. # 学习一个缩放参数,用于调节注意力的影响
  79. self.gamma = nn.Parameter(torch.zeros(1))
  80. def forward(self, x):
  81. m_batchsize, _, height, width = x.size()
  82. # 计算查询(Q)、键(K)、值(V)矩阵
  83. proj_query = self.query_conv(x)
  84. proj_query_H = proj_query.permute(0, 3, 1, 2).contiguous().view(m_batchsize * width, -1, height).permute(0, 2, 1)
  85. proj_query_W = proj_query.permute(0, 2, 1, 3).contiguous().view(m_batchsize * height, -1, width).permute(0, 2, 1)
  86. proj_key = self.key_conv(x)
  87. proj_key_H = proj_key.permute(0, 3, 1, 2).contiguous().view(m_batchsize * width, -1, height)
  88. proj_key_W = proj_key.permute(0, 2, 1, 3).contiguous().view(m_batchsize * height, -1, width)
  89. proj_value = self.value_conv(x)
  90. proj_value_H = proj_value.permute(0, 3, 1, 2).contiguous().view(m_batchsize * width, -1, height)
  91. proj_value_W = proj_value.permute(0, 2, 1, 3).contiguous().view(m_batchsize * height, -1, width)
  92. # 计算垂直和水平方向上的注意力分数,并应用无穷小掩码屏蔽自注意
  93. energy_H = (torch.bmm(proj_query_H, proj_key_H)+ self.INF(m_batchsize, height, width).to(device)).view(m_batchsize, width, height, height).permute(0, 2, 1, 3).to(device)
  94. energy_W = torch.bmm(proj_query_W, proj_key_W).view(m_batchsize, height, width, width)
  95. # 在垂直和水平方向上应用softmax归一化
  96. concate = self.softmax(torch.cat([energy_H, energy_W], 3))
  97. # 分离垂直和水平方向上的注意力,应用到值(V)矩阵上
  98. att_H = concate[:, :, :, 0:height].permute(0, 2, 1, 3).contiguous().view(m_batchsize * width, height, height)
  99. att_W = concate[:, :, :, height:height + width].contiguous().view(m_batchsize * height, width, width)
  100. # 计算最终的输出,加上输入x以应用残差连接
  101. out_H = torch.bmm(proj_value_H, att_H.permute(0, 2, 1)).view(m_batchsize, width, -1, height).permute(0, 2, 3, 1)
  102. out_W = torch.bmm(proj_value_W, att_W.permute(0, 2, 1)).view(m_batchsize, height, -1, width).permute(0, 2, 1, 3)
  103. return self.gamma * (out_H + out_W) + x
  104. class Net(nn.Module):
  105. def __init__(self, input_size, hidden_size, num_classes):
  106. super(Net, self).__init__()
  107. self.fc1 = nn.Linear(input_size, hidden_size)
  108. self.relu = nn.ReLU()
  109. self.fc2 = nn.Linear(hidden_size, num_classes)
  110. self.conv1 =nn.Conv2d(1,64,kernel_size=1)
  111. self.se = SEAttention(channel=1)
  112. self.cca = CrissCrossAttention(64)
  113. self.conv2 = nn.Conv2d(64,1,kernel_size=1)
  114. def forward(self, x):
  115. x = self.se(x)
  116. x = self.conv1(x)
  117. x = self.cca(x)
  118. x = self.conv2(x)
  119. out = self.fc1(x.view(-1, input_size))
  120. out = self.relu(out)
  121. out = self.fc2(out)
  122. return out
  123. model = Net(input_size,hidden_size,num_classes)
  124. model.to(device)
  125. criterion = nn.CrossEntropyLoss().to(device)
  126. optimizer = optim.Adam(model.parameters(),lr=learning_rate)
  127. train_loss_list = []
  128. test_loss_list = []
  129. #训练
  130. total_step = len(train_loader)
  131. for epoch in range(num_epoch):
  132. for i,(images,labels) in enumerate(train_loader):
  133. images,labels = images.to(device),labels.to(device)
  134. outputs = model(images).to(device) #获取模型分类后的结果
  135. loss = criterion(outputs,labels).to(device) #计算损失
  136. optimizer.zero_grad() #反向传播前,梯度清零
  137. loss.backward() #反向传播
  138. optimizer.step() #更新参数
  139. train_loss_list.append(loss.item())
  140. if (i+1)%100 ==0:
  141. print('Epoch[{}/{}],Step[{}/{}],Train Loss:{:.4f}'
  142. .format(epoch+1,num_epoch,i+1,total_step,loss.item()))
  143. model.eval()
  144. with torch.no_grad(): #禁止梯度计算
  145. test_loss = 0.0
  146. for images,labels in test_loader:
  147. images, labels = images.to(device), labels.to(device)
  148. outputs = model(images).to(device)
  149. loss = criterion(outputs,labels).to(device)
  150. test_loss +=loss.item()*images.size(0) #累加每个批次总损失得到总损失
  151. test_loss /=len(test_loader.dataset) #整个测试集的平均损失
  152. # 将计算得到的单个平均测试损失值扩展到一个列表中,长度为total_step
  153. # 这样做可能是为了在绘图时每个step都有一个对应的测试损失值,尽管实际测试损失在整个epoch内是恒定的
  154. test_loss_list.extend([test_loss]*total_step) #方便可视化
  155. model.train()
  156. print("Epoch[{}/{}],Test Loss:{:.4f}".format(epoch+1,num_epoch,test_loss))
  157. plt.plot(train_loss_list,label='Train Loss')
  158. plt.plot(test_loss_list,label='Test Loss')
  159. plt.title('model loss')
  160. plt.xlabel('iterations')
  161. plt.ylabel('Loss')
  162. plt.legend()
  163. plt.show()

1.7 调参对比

可以看到,该模型原始状态下损失值已经非常小了。

现在我把隐藏层神经元数量从原来的128改为256,学习率进一步减小为0.0005,我们看一下效果:

效果略微提升,但是我们可以看到在后面测试集的表现产生了一些波动,没有之前的模型稳定。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/835469
推荐阅读
相关标签
  

闽ICP备14008679号