当前位置:   article > 正文

pytorch——使用VGG-16实现cifar-10多分类,准确率90.97%_vgg16 cifar10

vgg16 cifar10

文章目录

一、前言

二、VGG-16网络介绍

三、VGG-16网络搭建与训练

3.1 网络结构搭建

3.2 模型训练

3.3 训练结果

四、总结


一、前言

刚入门卷积神经网络,在cifar-10数据集上复现了LeNet、AlexNet和VGG-16网络,发现VGG-16网络分类准确率最高,之后以VGG-16网络为基础疯狂调参,最终达到了90.97%的准确率。(继续进行玄学调参,可以更高)


二、VGG-16网络介绍

VGGNet是牛津大学视觉几何组(Visual Geometry Group)提出的模型,原文链接:VGG-16论文    该模型在2014年的ILSVRC中取得了分类任务第二、定位任务第一的优异成绩。

VGG网络架构

整体架构上,VGG的一大特点是在卷积层中统一使用了3×3的小卷积核和2×2大小的小池化核,层数更深,特征图更宽,证明了多个小卷积核的堆叠比单一大卷积核带来了精度提升,同时也降低了计算量。

在论文中,作者给出了5种VGGNet模型,层数分别是11,11,13,16,19,最后两种卷积神经网络即是常见的VGG-16以及VGG-19.该模型的主要缺点在于参数量有140M之多,需要更大的存储空间。

VGG网络模型参数

三、VGG-16网络搭建与训练

3.1 网络结构搭建

搭建VGG-16网络,代码如下:

  1. import torch
  2. import torchvision
  3. import torchvision.transforms as transforms
  4. import torch.nn as nn
  5. import torch.optim as optim
  6. import torch.nn.functional as F
  7. import numpy as np
  8. import matplotlib.pyplot as plt
  9. transform_train = transforms.Compose(
  10. [transforms.Pad(4),
  11. transforms.ToTensor(),
  12. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
  13. transforms.RandomHorizontalFlip(),
  14. transforms.RandomGrayscale(),
  15. transforms.RandomCrop(32, padding=4),
  16. ])
  17. transform_test = transforms.Compose(
  18. [
  19. transforms.ToTensor(),
  20. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]
  21. )
  22. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  23. trainset = torchvision.datasets.CIFAR10(root='dataset_method_1', train=True, download=True, transform=transform_train)
  24. trainLoader = torch.utils.data.DataLoader(trainset, batch_size=24, shuffle=True)
  25. testset = torchvision.datasets.CIFAR10(root='dataset_method_1', train=False, download=True, transform=transform_test)
  26. testLoader = torch.utils.data.DataLoader(testset, batch_size=24, shuffle=False)
  27. vgg = [96, 96, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']
  28. class VGG(nn.Module):
  29. def __init__(self, vgg):
  30. super(VGG, self).__init__()
  31. self.features = self._make_layers(vgg)
  32. self.dense = nn.Sequential(
  33. nn.Linear(512, 4096),
  34. nn.ReLU(inplace=True),
  35. nn.Dropout(0.4),
  36. nn.Linear(4096, 4096),
  37. nn.ReLU(inplace=True),
  38. nn.Dropout(0.4),
  39. )
  40. self.classifier = nn.Linear(4096, 10)
  41. def forward(self, x):
  42. out = self.features(x)
  43. out = out.view(out.size(0), -1)
  44. out = self.dense(out)
  45. out = self.classifier(out)
  46. return out
  47. def _make_layers(self, vgg):
  48. layers = []
  49. in_channels = 3
  50. for x in vgg:
  51. if x == 'M':
  52. layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
  53. else:
  54. layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
  55. nn.BatchNorm2d(x),
  56. nn.ReLU(inplace=True)]
  57. in_channels = x
  58. layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
  59. return nn.Sequential(*layers)

该结构相比传统的VGG-16有一些小小的改动:

(1)前两层卷积神经网络中把通道数从64改为了96。(玄学调参,准确率有一丁点的提升)

(2)在全连接层中把每一层的神经元Dropout概率从0.5调成了0.4。对于Dropout的概率值,个人感觉在该数据集中设为0.5并不是一个较好的选择,这会使得最后训练过程中的running_loss卡在400-500之间,无论把学习率调得多小也学不动了。在准确率上的体现就是,Dropout设为0.5时模型在测试集上的分类准确率一直在卡在89%左右,无法突破90%,而设为0.4之后就立刻提升到了接近91%,running_loss最终降到300左右。据此推测,继续调整Dropout参数可以让该模型的准确率在此基础上进一步提升(在此并没有尝试)。

在搭建该网络的过程中总结出的一些心得体会:

(1)对图像的预处理、数据增强的工作要做好。这可以让训练集更丰富,就像在五年高考真题中又衍生出了三年模拟题让神经网络学习,可以让模型更具有泛化能力,防止过拟合。根据原数据集的特点进行合适的数据增强(并不是所有的数据增强操作都可以提升准确率,有些操作加了反而会使得准确率下降),对分类准确率的提升是立竿见影的。

(2)batch_size不要设得太小。一开始的时候batch_size的值设成了4,结果一轮epoch需要训练12000次,即使用GPU跑也很耗时间。设成了24就比之前的快得多了(当然还可以设得更大一些)

3.2 模型训练

代码如下:

  1. model = VGG(vgg)
  2. # model.load_state_dict(torch.load('CIFAR-model/VGG16.pth'))
  3. optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
  4. loss_func = nn.CrossEntropyLoss()
  5. scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.4, last_epoch=-1)
  6. total_times = 40
  7. total = 0
  8. accuracy_rate = []
  9. def test():
  10. model.eval()
  11. correct = 0 # 预测正确的图片数
  12. total = 0 # 总共的图片数
  13. with torch.no_grad():
  14. for data in testLoader:
  15. images, labels = data
  16. images = images.to(device)
  17. outputs = model(images).to(device)
  18. outputs = outputs.cpu()
  19. outputarr = outputs.numpy()
  20. _, predicted = torch.max(outputs, 1)
  21. total += labels.size(0)
  22. correct += (predicted == labels).sum()
  23. accuracy = 100 * correct / total
  24. accuracy_rate.append(accuracy)
  25. print(f'准确率为:{accuracy}%'.format(accuracy))
  26. for epoch in range(total_times):
  27. model.train()
  28. model.to(device)
  29. running_loss = 0.0
  30. total_correct = 0
  31. total_trainset = 0
  32. for i, (data, labels) in enumerate(trainLoader, 0):
  33. data = data.to(device)
  34. outputs = model(data).to(device)
  35. labels = labels.to(device)
  36. loss = loss_func(outputs, labels).to(device)
  37. optimizer.zero_grad()
  38. loss.backward()
  39. optimizer.step()
  40. running_loss += loss.item()
  41. _, pred = outputs.max(1)
  42. correct = (pred == labels).sum().item()
  43. total_correct += correct
  44. total_trainset += data.shape[0]
  45. if i % 1000 == 0 and i > 0:
  46. print(f"正在进行第{i}次训练, running_loss={running_loss}".format(i, running_loss))
  47. running_loss = 0.0
  48. test()
  49. scheduler.step()
  50. # torch.save(model.state_dict(), 'CIFAR-model/VGG16.pth')
  51. accuracy_rate = np.array(accuracy_rate)
  52. times = np.linspace(1, total_times, total_times)
  53. plt.xlabel('times')
  54. plt.ylabel('accuracy rate')
  55. plt.plot(times, accuracy_rate)
  56. plt.show()
  57. print(accuracy_rate)

在训练网络的过程中总结出的一些心得体会:

(1)优化器使用SGD更好些,Adam一开始收敛速度确实较快,但是后期可能会出现模型难以收敛的情况。

(2)引入scheduler对学习率进行动态调整非常有效。训练初期为了加速收敛,可以把学习率设得大一些,在此设成了0.01,running_loss下降得很快;而在训练中后期,需要使用更小的学习率来一点点地推进。为了实现这种效果,第一种方案是不断保存模型的参数,之后修手动修改学习率再加载参数继续训练,第二种方案是使用lr_scheduler提供的各种动态调整方案进行动态调整。在此使用StepLR等间隔调整学习率,总的epoch是40次,每隔5次将学习率调整为原来的0.4(玄学调参)。在训练过程中可以很明显地看到,每隔5个epoch调整学习率之后,分类准确率相比上一个epoch突然有了很大的提升。

3.3 训练结果

完整代码如下:

  1. import torch
  2. import torchvision
  3. import torchvision.transforms as transforms
  4. import torch.nn as nn
  5. import torch.optim as optim
  6. import torch.nn.functional as F
  7. import numpy as np
  8. import matplotlib.pyplot as plt
  9. transform_train = transforms.Compose(
  10. [transforms.Pad(4),
  11. transforms.ToTensor(),
  12. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
  13. transforms.RandomHorizontalFlip(),
  14. transforms.RandomGrayscale(),
  15. transforms.RandomCrop(32, padding=4),
  16. ])
  17. transform_test = transforms.Compose(
  18. [
  19. transforms.ToTensor(),
  20. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]
  21. )
  22. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  23. trainset = torchvision.datasets.CIFAR10(root='dataset_method_1', train=True, download=True, transform=transform_train)
  24. trainLoader = torch.utils.data.DataLoader(trainset, batch_size=24, shuffle=True)
  25. testset = torchvision.datasets.CIFAR10(root='dataset_method_1', train=False, download=True, transform=transform_test)
  26. testLoader = torch.utils.data.DataLoader(testset, batch_size=24, shuffle=False)
  27. vgg = [96, 96, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']
  28. class VGG(nn.Module):
  29. def __init__(self, vgg):
  30. super(VGG, self).__init__()
  31. self.features = self._make_layers(vgg)
  32. self.dense = nn.Sequential(
  33. nn.Linear(512, 4096),
  34. nn.ReLU(inplace=True),
  35. nn.Dropout(0.4),
  36. nn.Linear(4096, 4096),
  37. nn.ReLU(inplace=True),
  38. nn.Dropout(0.4),
  39. )
  40. self.classifier = nn.Linear(4096, 10)
  41. def forward(self, x):
  42. out = self.features(x)
  43. out = out.view(out.size(0), -1)
  44. out = self.dense(out)
  45. out = self.classifier(out)
  46. return out
  47. def _make_layers(self, vgg):
  48. layers = []
  49. in_channels = 3
  50. for x in vgg:
  51. if x == 'M':
  52. layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
  53. else:
  54. layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
  55. nn.BatchNorm2d(x),
  56. nn.ReLU(inplace=True)]
  57. in_channels = x
  58. layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
  59. return nn.Sequential(*layers)
  60. model = VGG(vgg)
  61. # model.load_state_dict(torch.load('CIFAR-model/VGG16.pth'))
  62. optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
  63. loss_func = nn.CrossEntropyLoss()
  64. scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.4, last_epoch=-1)
  65. total_times = 40
  66. total = 0
  67. accuracy_rate = []
  68. def test():
  69. model.eval()
  70. correct = 0 # 预测正确的图片数
  71. total = 0 # 总共的图片数
  72. with torch.no_grad():
  73. for data in testLoader:
  74. images, labels = data
  75. images = images.to(device)
  76. outputs = model(images).to(device)
  77. outputs = outputs.cpu()
  78. outputarr = outputs.numpy()
  79. _, predicted = torch.max(outputs, 1)
  80. total += labels.size(0)
  81. correct += (predicted == labels).sum()
  82. accuracy = 100 * correct / total
  83. accuracy_rate.append(accuracy)
  84. print(f'准确率为:{accuracy}%'.format(accuracy))
  85. for epoch in range(total_times):
  86. model.train()
  87. model.to(device)
  88. running_loss = 0.0
  89. total_correct = 0
  90. total_trainset = 0
  91. for i, (data, labels) in enumerate(trainLoader, 0):
  92. data = data.to(device)
  93. outputs = model(data).to(device)
  94. labels = labels.to(device)
  95. loss = loss_func(outputs, labels).to(device)
  96. optimizer.zero_grad()
  97. loss.backward()
  98. optimizer.step()
  99. running_loss += loss.item()
  100. _, pred = outputs.max(1)
  101. correct = (pred == labels).sum().item()
  102. total_correct += correct
  103. total_trainset += data.shape[0]
  104. if i % 1000 == 0 and i > 0:
  105. print(f"正在进行第{i}次训练, running_loss={running_loss}".format(i, running_loss))
  106. running_loss = 0.0
  107. test()
  108. scheduler.step()
  109. # torch.save(model.state_dict(), 'CIFAR-model/VGG16.pth')
  110. accuracy_rate = np.array(accuracy_rate)
  111. times = np.linspace(1, total_times, total_times)
  112. plt.xlabel('times')
  113. plt.ylabel('accuracy rate')
  114. plt.plot(times, accuracy_rate)
  115. plt.show()
  116. print(accuracy_rate)

下面附上运行上述代码在测试集上得到的分类准确率变化折线图:

分类准确率折线图

四、总结

VGG网络确实算是古董了,如果想进一步提升准确率,可以考虑使用ResNet之类的结构。2022年发表的论文里把cifar-10分类的准确率做到了99.612%,实在是太猛了......

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/秋刀鱼在做梦/article/detail/957562
推荐阅读
相关标签
  

闽ICP备14008679号