当前位置:   article > 正文

CV实验之GoogLeNet网络训练总结_googlenet logs

googlenet logs

 一、GoogLeNet网络结构

GoogleNet模型的网络深度为22层( 如果只计算有参数层,GoogleNet网络有22层深 ,算上池化层则共有27层),而且在网络架构中引入了Inception单元,从而进一步提升模型整体的性能。虽然深度达到了22层,但大小却比AlexNet和VGG小很多,GoogleNet参数为500万个( 5M ),VGG16参数是138M,是GoogleNet的27倍多,而VGG16参数量则是AlexNet的两倍多。

二、GoogleNet 创新点:

1、Inception结构:利用不同大小的卷积核实现不同尺度的感知,最后进行融合,可以得到图像更好的表征。

2、使用了1*1卷积GoogLeNet 在网络中使用了 1x1 的卷积核来进行降维操作,从而减少通道数。这样做既可以减少计算复杂性,又可以提高模型的表达能力。

3、全局平均池化: 在网络的最后,GoogLeNet 使用了全局平均池化操作,将最后一个卷积层的特征图的每个通道的空间维度进行平均,得到一个固定大小的特征向量。这种操作可以降低模型中的参数数量,减少过拟合的风险。

三、训练与测试

我们对原始的GoogLeNet网络做了一些改进:

1、使用了批标准化(Batch Normalization):每个卷积层后面都添加了nn.BatchNorm2d层,这有助于加速网络的训练和提高模型的泛化能力。

2、增加了激活函数:在每个卷积层后面都使用了nn.ReLU(True)。

3、5x5卷积分支的改进:在原始的GoogLeNet中,5x5卷积分支只使用了一个5x5的卷积核。改进后的Inception模块5x5卷积分支使用了两个连续的3x3卷积核来替代,这有助于减少参数量并增加非线性能力。

学习率初始值为0.01,使用的是余弦退火学习率调节器,优化器使用的是SGD随机梯度下降优化器。

原始GoogLeNet模型具体结构:

微调GoogLeNet模型1具体结构:

微调GoogLeNet模型2具体结构:

具体代码如下所示:

main.py

  1. import torch
  2. import torchvision
  3. import torchvision.transforms as transforms
  4. import torch.nn as nn
  5. import torch.optim as optim
  6. import torch.nn.functional as F
  7. import numpy as np
  8. import matplotlib.pyplot as plt
  9. from torch.nn import init
  10. from torch.optim import Adam
  11. from torch.optim.lr_scheduler import StepLR
  12. from torch.utils.tensorboard import SummaryWriter
  13. from models import *
  14. transform_train = transforms.Compose(
  15. [
  16. transforms.RandomCrop(32, padding=4),
  17. transforms.RandomHorizontalFlip(),
  18. transforms.ToTensor(),
  19. transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
  20. ])
  21. transform_test = transforms.Compose(
  22. [
  23. transforms.ToTensor(),
  24. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]
  25. )
  26. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  27. trainset = torchvision.datasets.CIFAR10(root='D:\CV\pytorch\dataset\cifar-10-batches-py',
  28. train=True, download=True, transform=transform_train)
  29. trainLoader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
  30. testset = torchvision.datasets.CIFAR10(root='D:\CV\pytorch\dataset\cifar-10-batches-py',
  31. train=False, download=True, transform=transform_test)
  32. testLoader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
  33. writer = SummaryWriter('D:\CV\pytorch\pytorch-cifar-master\logs_googlenet_100ep')
  34. # net = VGG('VGG16')
  35. # net = ResNet50()
  36. net = GoogLeNet()
  37. # net = ResNet18()
  38. criterion = nn.CrossEntropyLoss()
  39. optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4) # 0.1,0.01
  40. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)
  41. total_times = 100
  42. total = 0
  43. accuracy_rate = []
  44. for epoch in range(total_times):
  45. net .train()
  46. net .to(device)
  47. running_loss = 0.0
  48. total_train_correct = 0
  49. total_train_samples = 0
  50. total_correct = 0
  51. total_trainset = 0
  52. for i, (data, labels) in enumerate(trainLoader, 0):
  53. data = data.to(device)
  54. outputs = net(data).to(device)
  55. labels = labels.to(device)
  56. loss = criterion(outputs, labels).to(device)
  57. optimizer.zero_grad()
  58. loss.backward()
  59. optimizer.step()
  60. running_loss += loss.item()
  61. _, pred = outputs.max(1)
  62. correct = (pred == labels).sum().item()
  63. total_correct += correct
  64. total_trainset += data.shape[0]
  65. running_loss += loss.item()
  66. total_train_correct += correct
  67. total_train_samples += data.shape[0]
  68. train_loss = running_loss / len(trainLoader)
  69. train_accuracy = total_train_correct / total_train_samples
  70. writer.add_scalar('Train/Loss', train_loss, epoch)
  71. writer.add_scalar('Train/Accuracy', train_accuracy, epoch)
  72. print('epoch[%d] train_loss: %.4f, train_acc: %.4f' % (epoch + 1, running_loss / len(trainLoader), train_accuracy))
  73. net.eval()
  74. correct = 0 # 预测正确的图片数
  75. total = 0 # 总共的图片数
  76. losses = [] # 用于记录每个批次的损失值
  77. with torch.no_grad():
  78. for data in testLoader:
  79. images, labels = data
  80. images = images.to(device)
  81. outputs = net(images).to(device)
  82. outputs = outputs.cpu()
  83. outputarr = outputs.numpy() # 将output转换为numpy
  84. _, predicted = torch.max(outputs, 1) # 获取每个样本在输出中的最大值以及对应索引,predicted 保存预测类别标签。
  85. total += labels.size(0) # 这是为了计算整个测试集的准确率时,获得正确的总样本数
  86. correct += (predicted == labels).sum() # (predicted == labels) 会生成一个布尔张量.sum() 对布尔张量进行求和,得到预测正确的样本数量。
  87. # 计算测试损失
  88. loss = criterion(outputs, labels)
  89. losses.append(loss.item())
  90. accuracy = 100 * correct / total
  91. accuracy_rate.append(accuracy)
  92. mean_loss = sum(losses) / len(losses) # 平均损失值
  93. writer.add_scalar('Test/Loss', mean_loss, epoch)
  94. writer.add_scalar('Test/Accuracy', accuracy, epoch)
  95. print(f'epoch[{epoch + 1}] test_lost: {mean_loss:.4f} test_acc: {accuracy:.2f}%')
  96. # print(f'测试准确率:{accuracy}%'.format(accuracy))
  97. # test()
  98. scheduler.step()
  99. writer.close()
  100. torch.save(net.state_dict(), 'D:\CV\pytorch\pytorch-cifar-master/res/GoogleNet_100epoch.pth')
  101. accuracy_rate = np.array(accuracy_rate)
  102. times = np.linspace(1, total_times, total_times)
  103. plt.xlabel('times')
  104. plt.ylabel('accuracy rate')
  105. plt.plot(times, accuracy_rate)
  106. plt.show()
  107. print(accuracy_rate)

model.py

  1. '''GoogLeNet with PyTorch.'''
  2. import torch
  3. import torch.nn as nn
  4. import torch.nn.functional as F
  5. class Inception(nn.Module):
  6. def __init__(self, in_planes, n1x1, n3x3red, n3x3, n5x5red, n5x5, pool_planes):
  7. super(Inception, self).__init__()
  8. # 1x1 conv branch
  9. self.b1 = nn.Sequential(
  10. nn.Conv2d(in_planes, n1x1, kernel_size=1),
  11. nn.BatchNorm2d(n1x1),
  12. nn.ReLU(True),
  13. )
  14. # 1x1 conv -> 3x3 conv branch
  15. self.b2 = nn.Sequential(
  16. nn.Conv2d(in_planes, n3x3red, kernel_size=1),
  17. nn.BatchNorm2d(n3x3red),
  18. nn.ReLU(True),
  19. nn.Conv2d(n3x3red, n3x3, kernel_size=3, padding=1),
  20. nn.BatchNorm2d(n3x3),
  21. nn.ReLU(True),
  22. )
  23. # 1x1 conv -> 5x5 conv branch
  24. self.b3 = nn.Sequential(
  25. nn.Conv2d(in_planes, n5x5red, kernel_size=1),
  26. nn.BatchNorm2d(n5x5red),
  27. nn.ReLU(True),
  28. nn.Conv2d(n5x5red, n5x5, kernel_size=3, padding=1),
  29. nn.BatchNorm2d(n5x5),
  30. nn.ReLU(True),
  31. nn.Conv2d(n5x5, n5x5, kernel_size=3, padding=1),
  32. nn.BatchNorm2d(n5x5),
  33. nn.ReLU(True),
  34. )
  35. # 3x3 pool -> 1x1 conv branch
  36. self.b4 = nn.Sequential(
  37. nn.MaxPool2d(3, stride=1, padding=1),
  38. nn.Conv2d(in_planes, pool_planes, kernel_size=1),
  39. nn.BatchNorm2d(pool_planes),
  40. nn.ReLU(True),
  41. )
  42. def forward(self, x):
  43. y1 = self.b1(x)
  44. y2 = self.b2(x)
  45. y3 = self.b3(x)
  46. y4 = self.b4(x)
  47. return torch.cat([y1,y2,y3,y4], 1)
  48. class GoogLeNet(nn.Module):
  49. def __init__(self):
  50. super(GoogLeNet, self).__init__()
  51. self.pre_layers = nn.Sequential(
  52. nn.Conv2d(3, 192, kernel_size=3, padding=1),
  53. nn.BatchNorm2d(192),
  54. nn.ReLU(True),
  55. )
  56. self.a3 = Inception(192, 64, 96, 128, 16, 32, 32)
  57. self.b3 = Inception(256, 128, 128, 192, 32, 96, 64)
  58. self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
  59. self.a4 = Inception(480, 192, 96, 208, 16, 48, 64)
  60. self.b4 = Inception(512, 160, 112, 224, 24, 64, 64)
  61. self.c4 = Inception(512, 128, 128, 256, 24, 64, 64)
  62. self.d4 = Inception(512, 112, 144, 288, 32, 64, 64)
  63. self.e4 = Inception(528, 256, 160, 320, 32, 128, 128)
  64. self.a5 = Inception(832, 256, 160, 320, 32, 128, 128)
  65. self.b5 = Inception(832, 384, 192, 384, 48, 128, 128)
  66. self.avgpool = nn.AvgPool2d(8, stride=1)
  67. self.linear = nn.Linear(1024, 10)
  68. def forward(self, x):
  69. out = self.pre_layers(x)
  70. out = self.a3(out)
  71. out = self.b3(out)
  72. out = self.maxpool(out)
  73. out = self.a4(out)
  74. out = self.b4(out)
  75. out = self.c4(out)
  76. out = self.d4(out)
  77. out = self.e4(out)
  78. out = self.maxpool(out)
  79. out = self.a5(out)
  80. out = self.b5(out)
  81. out = self.avgpool(out)
  82. out = out.view(out.size(0), -1)
  83. out = self.linear(out)
  84. return out
  85. def test():
  86. net = GoogLeNet()
  87. x = torch.randn(1, 3, 32, 32)
  88. y = net(x)
  89. print(y.size())

训练和测试的一些参数可以根据自己电脑的算力进行微调。在训练了100个epoch后,在CIRAR10数据集上的最高准确率为92.68%。

 

 参考文献:

深入解读GoogLeNet网络结构(附代码实现)_雷恩Layne的博客-CSDN博客

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/338812
推荐阅读
  

闽ICP备14008679号