1 VGGNet介绍

1.1 VGGnNet概述

VGGNet是牛津大学视觉几何组(Visual Geometry Group)提出的模型,故简称VGGNet, 该模型在2014年的ILSVRC中取得了分类任务第二、定位任务第一的优异成绩。该模型证明了增加网络的深度能够在一定程度上影响网络最终的性能。



1.2 VGG16网络结构     



1.2.1 输入层

输入的图像一般情况是彩色的三维图像,所以输入的维度是`[B,N,H,W] , B是batchsize的意思, N输入 的Channel,彩色图像是3 , H是高度为224 ,W是宽度为224。

输入的图像经历两个卷积3✖3的卷积, BatchNorm和ReLU,输出的维度是[B,64,224,224]。

这里要注意两点:第一,默认卷积层后保持维度不改变,第二,卷积层、 BN层和激活函数一般会同时存在。这两点已经成了默认的规则。


1.2.2 第二个卷积Block

到了第二个卷积Block,输入的维度为[[B,64,112,112],输出的Channel变成了128,所以在定义卷积的时 候需要扩大channel。

从上图中,我们可以看到是两个128Channel的卷积,但是这两个卷积还有所不同,在经历第一个卷积的时候,输入的Channel设置为64,输出的Channel设置为128,经过卷积之后维度变 为[B,128,112,112],然后再经过BN层和激活函数层。


1.2.3 第三个卷积Block


1.2.4 第四个卷积Block

同上, Channel为512,所以经过这个Block后,维度变为[B,512,14,14].

1.2.5 第五个卷积Block

同上, Channel为512,所以经过这个Block后,维度变为[B,512,7,7],经过卷积后得到了维度为[B,512,7,7]的特征图。

1.2.6 变换维度


  1. # forward函数中
  2. x = x.view(x.size(0), -1)


  1. # forward函数中
  2. x = torch.flatten(x, 1)


现在统一使用平均池化,pytorch官方使用的 nn.AdaptiveAvgPool2d ,也可以使用AvgPool2d。关于这两个平均池化的区别可以参考:nn.AdaptiveAvgPool2d和nn.AvgPool2d的区别。


  1. #init函数中
  2. self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
  3. # forward函数中
  4. x = self.avgpool(x)
  5. x = torch.flatten(x, 1)

1.2.7 全连接层

经过维度变换后,我们得到[B,512✖7✖7]的二维向量。然后输入第一个全连接层。所以第一个全连接层的输入为512✖7✖7,输出为4096。第二个全链接输入和输出均为4096,然后第三个全连接输入为 4096,输出为class,由于ImageNet的class为1000,所以输出为1000.

2 VGG16在pytorch下基于cifar-10数据集的实现

2.1 cifar-10数据集

Cifar-10 是由 Hinton 的学生 Alex Krizhevsky、Ilya Sutskever 收集的一个用于普适物体识别的计算机视觉数据集,它包含 60000 张 32 X 32 的 RGB 彩色图片,总共 10 个分类。其中,包括 50000 张用于训练集,10000 张用于测试集。

CIFAR-10数据集中一共包含10 个类别的RGB 彩色图片:飞机( airplane )、汽车( automobile )、鸟类( bird )、猫( cat )、鹿( deer )、狗( dog )、蛙类( frog )、马( horse )、船( ship )和卡车( truck )。

CIFAR-10是一个更接近普适物体的彩色图像数据集。与MNIST数据集相比, CIFAR-10具有以下不同点:

  • CIFAR-10 是3 通道的彩色RGB 图像,而MNIST 是灰度图像。
  • CIFAR-10 的图片尺寸为32 × 32 , 而MNIST 的图片尺寸为28 × 28 ,比MNIST 稍大。

相比于手写字符,CIFAR-10含有的是现实世界中真实的物体,不仅噪声很大,而且物体的比例、特征都不尽相同,这为识别带来很大困难。直接的线性模型如Softmax 在CIFAR-10 上表现得很差。

2.2 基于cifar-10的代码实现:

  1. import torch
  2. import torch.nn as nn
  3. import torchvision.transforms as transforms
  4. import torch.optim as optim
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import datetime
  8. from torchvision import datasets
  9. from torch.utils.data import DataLoader
  10. # import matplotlib
  11. # matplotlib.use('TkAgg')
  12. VGG_types = {
  13. "VGG11": [64, "M", 128, "M", 256, 256, "M", 512, 512, "M", 512, 512, "M"],
  14. "VGG13": [64, 64, "M", 128, 128, "M", 256, 256, "M", 512, 512, "M", 512, 512, "M"],
  15. "VGG16": [64, 64, "M", 128, 128, "M", 256, 256, 256, "M", 512, 512, 512,
  16. "M", 512, 512, 512, "M"],
  17. "VGG19": [64, 64, "M", 128, 128, "M", 256, 256, 256, 256, "M", 512, 512, 512, 512,
  18. "M", 512, 512, 512, 512, "M"]
  19. }
  20. VGGType = "VGG16"
  21. class VGGnet(nn.Module):
  22. def __init__(self, in_channels=3, num_classes=1000):
  23. super(VGGnet, self).__init__()
  24. self.in_channels = in_channels
  25. self.conv_layers = self._create_layers(VGG_types[VGGType])
  26. self.fcs = nn.Sequential(
  27. nn.Linear(512 * 7 * 7, 4096),
  28. nn.ReLU(),
  29. nn.Dropout(p=0.5),
  30. nn.Linear(4096, 4096),
  31. nn.ReLU(),
  32. nn.Dropout(p=0.5),
  33. nn.Linear(4096, num_classes),
  34. )
  35. def forward(self, x):
  36. x = self.conv_layers(x)
  37. x = x.reshape(x.shape[0], -1)
  38. x = self.fcs(x)
  39. return x
  40. def _create_layers(self, architecture):
  41. layers = []
  42. in_channels = self.in_channels
  43. for x in architecture:
  44. if type(x) == int:
  45. out_channels = x
  46. layers += [
  47. nn.Conv2d(
  48. in_channels=in_channels,
  49. out_channels=out_channels,
  50. kernel_size=(3, 3),
  51. stride=(1, 1),
  52. padding=(1, 1),
  53. ),
  54. nn.BatchNorm2d(x),
  55. nn.ReLU(),
  56. ]
  57. in_channels = x
  58. elif x == "M":
  59. layers += [nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))]
  60. return nn.Sequential(*layers)
  61. transform_train = transforms.Compose(
  62. [
  63. transforms.Pad(4),
  64. transforms.ToTensor(),
  65. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
  66. transforms.RandomHorizontalFlip(),
  67. transforms.RandomGrayscale(),
  68. transforms.RandomCrop(32, padding=4),
  69. transforms.Resize((224, 224))
  70. ])
  71. transform_test = transforms.Compose(
  72. [
  73. transforms.ToTensor(),
  74. transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
  75. transforms.Resize((224, 224))
  76. ]
  77. )
  78. train_data = datasets.CIFAR10(
  79. root="data",
  80. train=True,
  81. download=True,
  82. transform=transform_train,
  83. )
  84. test_data = datasets.CIFAR10(
  85. root="data",
  86. train=False,
  87. download=True,
  88. transform=transform_test,
  89. )
  90. def get_format_time():
  91. return datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  92. if __name__ == "__main__":
  93. train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
  94. test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)
  95. device = "cuda" if torch.cuda.is_available() else "cpu"
  96. model = VGGnet(in_channels=3, num_classes=10).to(device)
  97. print(model)
  98. optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
  99. loss_func = nn.CrossEntropyLoss()
  100. scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.4, last_epoch=-1)
  101. epochs = 40
  102. total = 0
  103. accuracy_rate = []
  104. for epoch in range(epochs):
  105. model.train()
  106. train_loss = 0.0
  107. train_correct = 0
  108. train_total = 0
  109. print(f"{get_format_time()},train epoch: {epoch}/{epochs}")
  110. for step, (images, labels) in enumerate(train_loader, 0):
  111. images, labels = images.to(device), labels.to(device)
  112. outputs = model(images).to(device)
  113. loss = loss_func(outputs, labels).to(device)
  114. optimizer.zero_grad()
  115. loss.backward()
  116. optimizer.step()
  117. train_loss += loss.item()
  118. _, predicted = outputs.max(1)
  119. correct = torch.sum(predicted == labels)
  120. train_correct += correct
  121. train_total += images.shape[0]
  122. train_loss += loss.item()
  123. if step % 100 == 0 and step > 0:
  124. print(f"{get_format_time()},train epoch = {epoch}, step = {step}, "
  125. f"train_loss={train_loss}")
  126. train_loss = 0.0
  127. # 在测试集上进行验证
  128. model.eval()
  129. test_correct = 0
  130. test_total = 0
  131. with torch.no_grad():
  132. for images, labels in test_loader:
  133. images = images.to(device)
  134. labels = labels.to(device)
  135. outputs = model(images).to(device)
  136. _, predicted = torch.max(outputs, 1)
  137. test_total += labels.size(0)
  138. test_correct += torch.sum(predicted == labels)
  139. accuracy = 100 * test_correct / test_total
  140. accuracy_rate.append(accuracy)
  141. print(f"{get_format_time()},test epoch = {epoch}, accuracy={accuracy}")
  142. scheduler.step()
  143. accuracy_rate = torch.tensor(accuracy_rate).detach().cpu().numpy()
  144. times = np.linspace(1, epochs, epochs)
  145. plt.xlabel('times')
  146. plt.ylabel('accuracy rate')
  147. plt.plot(times, accuracy_rate)
  148. plt.show()
  149. print(f"{get_format_time()},accuracy_rate={accuracy_rate}")


  1. VGGnet(
  2. (conv_layers): Sequential(
  3. (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  4. (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  5. (2): ReLU()
  6. (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  7. (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  8. (5): ReLU()
  9. (6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  10. (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  11. (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  12. (9): ReLU()
  13. (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  14. (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  15. (12): ReLU()
  16. (13): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  17. (14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  18. (15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  19. (16): ReLU()
  20. (17): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  21. (18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  22. (19): ReLU()
  23. (20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  24. (21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  25. (22): ReLU()
  26. (23): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  27. (24): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  28. (25): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  29. (26): ReLU()
  30. (27): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  31. (28): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  32. (29): ReLU()
  33. (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  34. (31): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  35. (32): ReLU()
  36. (33): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  37. (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  38. (35): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  39. (36): ReLU()
  40. (37): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  41. (38): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  42. (39): ReLU()
  43. (40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  44. (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  45. (42): ReLU()
  46. (43): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  47. )
  48. (fcs): Sequential(
  49. (0): Linear(in_features=25088, out_features=4096, bias=True)
  50. (1): ReLU()
  51. (2): Dropout(p=0.5, inplace=False)
  52. (3): Linear(in_features=4096, out_features=4096, bias=True)
  53. (4): ReLU()
  54. (5): Dropout(p=0.5, inplace=False)
  55. (6): Linear(in_features=4096, out_features=10, bias=True)
  56. )
  57. )


  1. 2023-12-22 13:56:27,train epoch: 39/40
  2. 2023-12-22 13:56:49,train epoch = 39, step = 100, train_loss=42.07486420869827
  3. 2023-12-22 13:57:13,train epoch = 39, step = 200, train_loss=43.89785052835941
  4. 2023-12-22 13:57:36,train epoch = 39, step = 300, train_loss=41.38636288046837
  5. 2023-12-22 13:58:00,train epoch = 39, step = 400, train_loss=40.616311356425285
  6. 2023-12-22 13:58:24,train epoch = 39, step = 500, train_loss=41.17254985868931
  7. 2023-12-22 13:58:47,train epoch = 39, step = 600, train_loss=40.342166878283024
  8. 2023-12-22 13:59:11,train epoch = 39, step = 700, train_loss=39.25042723119259
  9. 2023-12-22 13:59:40,test epoch = 39, accuracy=88.98999786376953
  10. 2023-12-22 13:59:40,accuracy_rate=[40.629997 37.79 38.98 58.69 43.739998 64.64 66.13
  11. 72. 64.869995 71.99 80.04 80.729996 82.43 81.759995
  12. 83.81 83.31 84.869995 86.369995 86.4 85.95 88.14
  13. 88.13 88.06 86.81 87.979996 88.549995 88.53 88.549995
  14. 88.409996 88.75 88.67 88.79 88.71 88.75 88.75
  15. 88.86 88.63 88.72 88.829994 88.99 ]
