当前位置:   article > 正文

机器学习的入门笔记(第十五周)

机器学习的入门笔记(第十五周)

本周观看了B站up主霹雳吧啦Wz的图像处理的课程,

课程链接:霹雳吧啦Wz的个人空间-霹雳吧啦Wz个人主页-哔哩哔哩视频

下面是本周的所看的课程总结。

利用GoogLeNet进行图像分类

GoogLeNet是由 Google 提出的卷积神经网络架构,于 2014 年在 ImageNet 竞赛中获得了显著的成功。

GoogLeNet 的核心创新是引入了 Inception 模块,Inception 模块通过并行处理不同的卷积核尺寸和最大池化操作,能够在一个层级中提取多尺度的特征。

GoogLeNet整体架构如下:

GoogLeNet网络中的亮点如下:

Inception的结构如下,其中在右边图片的黄色背景的1*1的卷积核的作用是为了降维,1x1 卷积在保持特征图尺寸的同时,能够减少通道数,从而降低计算复杂度。一个输入的图片,经过四种不同的变化,提取图片的特征,最后输出将通道数相加。

注:每个分支所得的特征矩阵高和宽必须相同

上述的Inception结构使用了1*1的卷积核用于降维,若不使用1*1的卷积核降维,那么如下图所示,参数会很多,计算会更慢,更难。

若使用了1*1的卷积核用于降维,那么如下图所示,将512的通道数变为24,在进行5*5的卷积核操作,所需要的参数要少很多。

其中为了缓解梯度消失的问题,GoogLeNet 在网络的中间层引入了两个辅助分类器。这些辅助分类器帮助提供额外的梯度信号,有助于训练更深的网络,如下图所示

最后在GoogLeNet网络的最后一层,使用了全局平均池化层,通过对每个特征图的所有空间位置取平均值,减少了参数数量,并降低了过拟合的风险。

在GoogLeNet中,把进行主分类器之前进行全局平均池化操作,使得高和宽都变为1的特征矩阵,减少了参数数量。

同样,GoogLeNet的模型参数相比VGGNet的参数要少很多。

代码实现

1、定义BasicConv2d类,为卷积模版,将图片进行卷积层和激活函数的输出

  1. class BasicConv2d(nn.Module):
  2. """
  3. 卷积模板
  4. """
  5. def __init__(self, in_channels, out_channels, **kwargs):
  6. super(BasicConv2d, self).__init__()
  7. self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
  8. self.relu = nn.ReLU(inplace=True)
  9. def forward(self, x):
  10. x = self.conv(x)
  11. x = self.relu(x)
  12. return x

2、定义Inception类,是GoogLeNet的Inception块的输出,Inception块有4部分构成,分别是1*1的卷积核,3*3的卷积核,5*5的卷积核、池化层,它们每个的输出只是通道数目不一致,高,宽都想同,最后将这四部分的输出在通道维度上进行相加拼接

  1. class Inception(nn.Module):
  2. # red是reduce的缩写,指通过1*1的卷积核进行降维
  3. def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
  4. super(Inception, self).__init__()
  5. self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)
  6. self.branch2 = nn.Sequential(
  7. BasicConv2d(in_channels, ch3x3red, kernel_size=1),
  8. BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1), # 保证输出大小等于输入大小
  9. )
  10. self.branch3 = nn.Sequential(
  11. BasicConv2d(in_channels, ch5x5red, kernel_size=1),
  12. BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2), # 保证输出大小等于输入大小
  13. )
  14. self.branch4 = nn.Sequential(
  15. nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
  16. BasicConv2d(in_channels, pool_proj, kernel_size=1)
  17. )
  18. def forward(self, x):
  19. branch1 = self.branch1(x)
  20. branch2 = self.branch2(x)
  21. branch3 = self.branch3(x)
  22. branch4 = self.branch4(x)
  23. outputs = [branch1, branch2, branch3, branch4]
  24. # 在维度1(通道)上进行拼接
  25. return torch.cat(outputs, dim=1)

3、定义InceptionAux这个类,在Inception块4a和4d后会有辅助分类器,只有在训练阶段有效,缓解梯度消失问题

  1. class InceptionAux(nn.Module):
  2. """
  3. Auxiliary Classifier 辅助分类器
  4. """
  5. def __init__(self, in_channels, num_classes):
  6. super(InceptionAux, self).__init__()
  7. # 平均池化层
  8. self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
  9. self.conv = BasicConv2d(in_channels, 128, kernel_size=1) # output (batch,128,4,4)
  10. self.fc1 = nn.Linear(2048, 1024)
  11. self.fc2 = nn.Linear(1024, num_classes)
  12. def forward(self, x):
  13. # aux1 N*512*14*14 aux2 N*528*14*14
  14. x = self.averagePool(x)
  15. # aux1 N*512*4*4 aux2 N*528*4*4
  16. x = self.conv(x)
  17. # N*128*4*4
  18. x = torch.flatten(x, start_dim=1) # 从channel这个维度往后展开 N * 2048
  19. # # 根据实际的训练结果微调 可以通过model.train和model.eval控制模型的状态,model.train时候,self.training=True
  20. x = F.dropout(x, 0.5, training=self.training)
  21. # N * 2048
  22. x = F.relu(self.fc1(x), inplace=True)
  23. x = F.dropout(x, 0.5, training=self.training)
  24. # N * 1024
  25. x = self.fc2(x)
  26. return x

4、定义GoogLeNet网络

  1. class GoogLeNet(nn.Module):
  2. # aux_logits 是否使用辅助分类器
  3. def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
  4. super(GoogLeNet, self).__init__()
  5. self.aux_logits = aux_logits
  6. self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
  7. self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True) # ceil_model为True,是向下取整
  8. self.conv2 = BasicConv2d(64, 64, kernel_size=1)
  9. self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
  10. self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
  11. self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
  12. self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
  13. self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
  14. self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
  15. self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
  16. self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
  17. self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
  18. self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
  19. self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
  20. self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
  21. self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)
  22. if self.aux_logits:
  23. self.aux1 = InceptionAux(512, num_classes)
  24. self.aux2 = InceptionAux(528, num_classes)
  25. self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # 高和宽都变为1的特征矩阵
  26. self.dropout = nn.Dropout(0.4)
  27. self.fc = nn.Linear(1024, num_classes)
  28. if init_weights:
  29. self._initialize_weights()
  30. def forward(self, x):
  31. # N x 3 x 224 x 224
  32. x = self.conv1(x)
  33. # N x 64 x 112 x 112
  34. x = self.maxpool1(x)
  35. # N x 64 x 56 x 56
  36. x = self.conv2(x)
  37. # N x 64 x 56 x 56
  38. x = self.conv3(x)
  39. # N x 192 x 56 x 56
  40. x = self.maxpool2(x)
  41. # N x 192 x 28 x 28
  42. x = self.inception3a(x)
  43. # N x 256 x 28 x 28
  44. x = self.inception3b(x)
  45. # N x 480 x 28 x 28
  46. x = self.maxpool3(x)
  47. # N x 480 x 14 x 14
  48. x = self.inception4a(x)
  49. # N x 512 x 14 x 14
  50. if self.training and self.aux_logits: # eval model lose this layer 当前模型是否处于训练模式
  51. aux1 = self.aux1(x)
  52. x = self.inception4b(x)
  53. # N x 512 x 14 x 14
  54. x = self.inception4c(x)
  55. # N x 512 x 14 x 14
  56. x = self.inception4d(x)
  57. # N x 528 x 14 x 14
  58. if self.training and self.aux_logits: # eval model lose this layer
  59. aux2 = self.aux2(x)
  60. x = self.inception4e(x)
  61. # N x 832 x 14 x 14
  62. x = self.maxpool4(x)
  63. # N x 832 x 7 x 7
  64. x = self.inception5a(x)
  65. # N x 832 x 7 x 7
  66. x = self.inception5b(x)
  67. # N x 1024 x 7 x 7
  68. x = self.avgpool(x)
  69. # N x 1024 x 1 x 1
  70. x = torch.flatten(x, 1)
  71. # N x 1024
  72. x = self.dropout(x)
  73. x = self.fc(x)
  74. # N x 1000 (num_classes)
  75. if self.training and self.aux_logits: # eval model lose this layer
  76. return x, aux2, aux1 # 主分类器,辅助分类器的返回
  77. return x # 主分类器
  78. def _initialize_weights(self):
  79. for m in self.modules():
  80. if isinstance(m, nn.Conv2d):
  81. nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
  82. if m.bias is not None:
  83. nn.init.constant_(m.bias, 0)
  84. elif isinstance(m, nn.Linear):
  85. nn.init.normal_(m.weight, 0, 0.01)
  86. nn.init.constant_(m.bias, 0)

5、选择GPU设备

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

6、对图片进行预处理操作,数据增强

  1. data_transform = {
  2. 'train': torchvision.transforms.Compose([
  3. torchvision.transforms.RandomResizedCrop(224),
  4. torchvision.transforms.RandomHorizontalFlip(),
  5. torchvision.transforms.ToTensor(),
  6. torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
  7. ]),
  8. 'val': torchvision.transforms.Compose([
  9. torchvision.transforms.Resize((224, 224)),
  10. torchvision.transforms.ToTensor(),
  11. torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
  12. ])
  13. }

7、定义训练加载器,测试加载器

  1. image_path = '../data/flower_data'
  2. train_dataset = torchvision.datasets.ImageFolder(root=os.path.join(image_path, 'train'),
  3. transform=data_transform['train'])
  4. train_num = len(train_dataset) # 3306
  5. batch_size = 32
  6. train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
  7. val_dataset = torchvision.datasets.ImageFolder(root=os.path.join(image_path, 'val'), transform=data_transform['val'])
  8. val_num = len(val_dataset) # 364
  9. val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)
  10. train_steps = len(train_loader) # 104 3306/32
  11. val_steps = len(val_loader) # 12 364/32

8、将flower数据集的种类名称转换为key为索引值,value为种类名称,并保存在json文件中

  1. flower_list = train_dataset.class_to_idx
  2. cla_dict = dict((val, key) for key, val in flower_list.items())
  3. json_str = json.dumps(cla_dict, indent=4)
  4. with open('class_indices.json', 'w') as f:
  5. f.write(json_str)
  6. f.close()
  7. '''
  8. {
  9. "0": "daisy",
  10. "1": "dandelion",
  11. "2": "roses",
  12. "3": "sunflowers",
  13. "4": "tulips"
  14. }
  15. '''

9、将GoogLeNet网络实例化,并定义损失函数,优化器

  1. net = GoogLeNet(num_classes=5, aux_logits=True, init_weights=True)
  2. net.to(device)
  3. loss_function = nn.CrossEntropyLoss()
  4. optimizer = torch.optim.Adam(net.parameters(), lr=0.0003)

10、对flower数据集进行训练

注:在训练过程中,需要算出主分类器和辅助分类器的损失,最后通过一定比例的相加得到最终损失

  1. best_acc = 0.0
  2. save_path = './GoogLeNet.pth'
  3. epochs = 30
  4. for epoch in range(epochs):
  5. # train
  6. net.train()
  7. running_loss = 0.0
  8. train_bar = tqdm(train_loader, file=sys.stdout)
  9. for step, data in enumerate(train_bar):
  10. images, labels = data
  11. optimizer.zero_grad()
  12. logits, aux_logits2, aux_logits1 = net(images.to(device))
  13. loss0 = loss_function(logits, labels.to(device))
  14. loss1 = loss_function(aux_logits1, labels.to(device))
  15. loss2 = loss_function(aux_logits2, labels.to(device))
  16. loss = loss0 + loss1 * 0.3 + loss2 * 0.3
  17. loss.backward()
  18. optimizer.step()
  19. running_loss += loss.item()
  20. train_bar.desc = f'train epoch[{epoch + 1}/{epochs}] loss:{loss:.3f}'
  21. # 测试阶段不需要考虑辅助分类器,只需要考虑主分类器
  22. net.eval()
  23. acc = 0.0
  24. with torch.no_grad():
  25. val_bar = tqdm(val_loader, file=sys.stdout)
  26. for step, val_data in enumerate(val_bar):
  27. val_images, val_labels = val_data
  28. outputs = net(val_images.to(device))
  29. predict_y = torch.max(outputs, dim=1)[1]
  30. acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
  31. accurate = acc / val_num
  32. print(f'[epoch {epoch + 1}] train_loss: {running_loss / train_steps:.3f} val_accuracy: {accurate:.3f}')
  33. if accurate > best_acc:
  34. best_acc = accurate
  35. torch.save(net.state_dict(), save_path)
  36. print('Finished Training')
  37. '''
  38. train epoch[1/30] loss:1.448: 100%|██████████| 104/104 [00:21<00:00, 4.80it/s]
  39. 100%|██████████| 12/12 [00:01<00:00, 6.36it/s]
  40. [epoch 1] train_loss: 1.490 val_accuracy: 0.626
  41. train epoch[2/30] loss:1.987: 100%|██████████| 104/104 [00:22<00:00, 4.68it/s]
  42. 100%|██████████| 12/12 [00:01<00:00, 6.76it/s]
  43. [epoch 2] train_loss: 1.493 val_accuracy: 0.604
  44. train epoch[3/30] loss:0.985: 100%|██████████| 104/104 [00:21<00:00, 4.73it/s]
  45. 100%|██████████| 12/12 [00:01<00:00, 7.11it/s]
  46. [epoch 3] train_loss: 1.384 val_accuracy: 0.679
  47. train epoch[4/30] loss:1.274: 100%|██████████| 104/104 [00:21<00:00, 4.80it/s]
  48. 100%|██████████| 12/12 [00:01<00:00, 7.23it/s]
  49. [epoch 4] train_loss: 1.380 val_accuracy: 0.676
  50. train epoch[5/30] loss:1.055: 100%|██████████| 104/104 [00:22<00:00, 4.69it/s]
  51. 100%|██████████| 12/12 [00:01<00:00, 6.72it/s]
  52. [epoch 5] train_loss: 1.339 val_accuracy: 0.692
  53. train epoch[6/30] loss:1.568: 100%|██████████| 104/104 [00:21<00:00, 4.83it/s]
  54. 100%|██████████| 12/12 [00:01<00:00, 7.29it/s]
  55. [epoch 6] train_loss: 1.264 val_accuracy: 0.706
  56. train epoch[7/30] loss:1.550: 100%|██████████| 104/104 [00:21<00:00, 4.75it/s]
  57. 100%|██████████| 12/12 [00:01<00:00, 6.59it/s]
  58. [epoch 7] train_loss: 1.224 val_accuracy: 0.720
  59. train epoch[8/30] loss:0.771: 100%|██████████| 104/104 [00:21<00:00, 4.82it/s]
  60. 100%|██████████| 12/12 [00:01<00:00, 7.18it/s]
  61. [epoch 8] train_loss: 1.144 val_accuracy: 0.698
  62. train epoch[9/30] loss:2.318: 100%|██████████| 104/104 [00:21<00:00, 4.90it/s]
  63. 100%|██████████| 12/12 [00:01<00:00, 7.16it/s]
  64. [epoch 9] train_loss: 1.189 val_accuracy: 0.717
  65. train epoch[10/30] loss:0.495: 100%|██████████| 104/104 [00:21<00:00, 4.73it/s]
  66. 100%|██████████| 12/12 [00:01<00:00, 6.88it/s]
  67. [epoch 10] train_loss: 1.137 val_accuracy: 0.690
  68. train epoch[11/30] loss:0.274: 100%|██████████| 104/104 [00:21<00:00, 4.75it/s]
  69. 100%|██████████| 12/12 [00:01<00:00, 7.16it/s]
  70. [epoch 11] train_loss: 1.108 val_accuracy: 0.695
  71. train epoch[12/30] loss:0.913: 100%|██████████| 104/104 [00:21<00:00, 4.79it/s]
  72. 100%|██████████| 12/12 [00:01<00:00, 6.85it/s]
  73. [epoch 12] train_loss: 1.120 val_accuracy: 0.698
  74. train epoch[13/30] loss:1.103: 100%|██████████| 104/104 [00:21<00:00, 4.74it/s]
  75. 100%|██████████| 12/12 [00:01<00:00, 6.95it/s]
  76. [epoch 13] train_loss: 1.037 val_accuracy: 0.670
  77. train epoch[14/30] loss:1.682: 100%|██████████| 104/104 [00:21<00:00, 4.84it/s]
  78. 100%|██████████| 12/12 [00:01<00:00, 7.12it/s]
  79. [epoch 14] train_loss: 1.081 val_accuracy: 0.736
  80. train epoch[15/30] loss:1.607: 100%|██████████| 104/104 [00:22<00:00, 4.69it/s]
  81. 100%|██████████| 12/12 [00:01<00:00, 6.90it/s]
  82. [epoch 15] train_loss: 0.998 val_accuracy: 0.736
  83. train epoch[16/30] loss:0.204: 100%|██████████| 104/104 [00:21<00:00, 4.74it/s]
  84. 100%|██████████| 12/12 [00:01<00:00, 6.93it/s]
  85. [epoch 16] train_loss: 0.981 val_accuracy: 0.750
  86. train epoch[17/30] loss:0.499: 100%|██████████| 104/104 [00:21<00:00, 4.77it/s]
  87. 100%|██████████| 12/12 [00:01<00:00, 6.72it/s]
  88. [epoch 17] train_loss: 0.958 val_accuracy: 0.736
  89. train epoch[18/30] loss:0.666: 100%|██████████| 104/104 [00:22<00:00, 4.66it/s]
  90. 100%|██████████| 12/12 [00:01<00:00, 7.21it/s]
  91. [epoch 18] train_loss: 0.949 val_accuracy: 0.777
  92. train epoch[19/30] loss:1.036: 100%|██████████| 104/104 [00:21<00:00, 4.73it/s]
  93. 100%|██████████| 12/12 [00:01<00:00, 7.26it/s]
  94. [epoch 19] train_loss: 0.954 val_accuracy: 0.761
  95. train epoch[20/30] loss:1.162: 100%|██████████| 104/104 [00:22<00:00, 4.70it/s]
  96. 100%|██████████| 12/12 [00:01<00:00, 6.83it/s]
  97. [epoch 20] train_loss: 0.896 val_accuracy: 0.772
  98. train epoch[21/30] loss:0.682: 100%|██████████| 104/104 [00:21<00:00, 4.81it/s]
  99. 100%|██████████| 12/12 [00:01<00:00, 6.87it/s]
  100. [epoch 21] train_loss: 0.924 val_accuracy: 0.755
  101. train epoch[22/30] loss:1.488: 100%|██████████| 104/104 [00:21<00:00, 4.76it/s]
  102. 100%|██████████| 12/12 [00:01<00:00, 6.91it/s]
  103. [epoch 22] train_loss: 0.880 val_accuracy: 0.758
  104. train epoch[23/30] loss:1.137: 100%|██████████| 104/104 [00:21<00:00, 4.75it/s]
  105. 100%|██████████| 12/12 [00:01<00:00, 6.99it/s]
  106. [epoch 23] train_loss: 0.866 val_accuracy: 0.766
  107. train epoch[24/30] loss:0.498: 100%|██████████| 104/104 [00:21<00:00, 4.82it/s]
  108. 100%|██████████| 12/12 [00:01<00:00, 6.97it/s]
  109. [epoch 24] train_loss: 0.872 val_accuracy: 0.753
  110. train epoch[25/30] loss:0.650: 100%|██████████| 104/104 [00:21<00:00, 4.80it/s]
  111. 100%|██████████| 12/12 [00:01<00:00, 6.75it/s]
  112. [epoch 25] train_loss: 0.798 val_accuracy: 0.786
  113. train epoch[26/30] loss:1.176: 100%|██████████| 104/104 [00:21<00:00, 4.83it/s]
  114. 100%|██████████| 12/12 [00:01<00:00, 7.06it/s]
  115. [epoch 26] train_loss: 0.801 val_accuracy: 0.780
  116. train epoch[27/30] loss:0.439: 100%|██████████| 104/104 [00:21<00:00, 4.84it/s]
  117. 100%|██████████| 12/12 [00:01<00:00, 6.96it/s]
  118. [epoch 27] train_loss: 0.874 val_accuracy: 0.720
  119. train epoch[28/30] loss:0.958: 100%|██████████| 104/104 [00:21<00:00, 4.76it/s]
  120. 100%|██████████| 12/12 [00:01<00:00, 7.07it/s]
  121. [epoch 28] train_loss: 0.834 val_accuracy: 0.819
  122. train epoch[29/30] loss:0.478: 100%|██████████| 104/104 [00:22<00:00, 4.60it/s]
  123. 100%|██████████| 12/12 [00:01<00:00, 6.50it/s]
  124. [epoch 29] train_loss: 0.803 val_accuracy: 0.775
  125. train epoch[30/30] loss:0.976: 100%|██████████| 104/104 [00:22<00:00, 4.60it/s]
  126. 100%|██████████| 12/12 [00:01<00:00, 6.97it/s]
  127. [epoch 30] train_loss: 0.754 val_accuracy: 0.780
  128. Finished Training
  129. '''

11、模型保存成功后,进行图片的预测,对预测的图片的预处理操作

  1. data_transform = torchvision.transforms.Compose([
  2. torchvision.transforms.Resize((224, 224)),
  3. torchvision.transforms.ToTensor(),
  4. torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
  5. ])
  6. img = Image.open('../Test2_alexnet/tulip.jpg')
  7. img = data_transform(img)
  8. # img = img.unsqueeze(0)
  9. img = torch.unsqueeze(img, dim=0)
  10. try:
  11. json_file = open('./class_indices.json','r')
  12. class_indices = json.load(json_file)
  13. except Exception as e:
  14. print(e)
  15. exit(-1)

12、将GoogLeNet模型实例化,并加载保存的模型权重参数

  1. model = GoogLeNet(num_classes=5,aux_logits=False)
  2. model_weight_path = './GoogLeNet.pth'
  3. # strict为False,当前模型不导入辅助分类器参数
  4. missing_keys,unexpected_keys = model.load_state_dict(torch.load(model_weight_path,map_location='cpu'), strict=False)
  5. print(missing_keys)
  6. print(unexpected_keys)
  7. '''
  8. []
  9. ['aux1.conv.conv.weight', 'aux1.conv.conv.bias', 'aux1.fc1.weight', 'aux1.fc1.bias', 'aux1.fc2.weight', 'aux1.fc2.bias', 'aux2.conv.conv.weight', 'aux2.conv.conv.bias', 'aux2.fc1.weight', 'aux2.fc1.bias', 'aux2.fc2.weight', 'aux2.fc2.bias']
  10. '''

13、进行预测

  1. model.eval()
  2. with torch.no_grad():
  3. output = torch.squeeze(model(img))
  4. predict = torch.softmax(output,dim=-1)
  5. # predict = torch.max(predict_y,dim=1)[1]
  6. predict_cla = torch.argmax(predict).numpy()
  7. print(class_indices[str(predict_cla)],predict[predict_cla].item())
  8. '''
  9. tulips 0.9999539852142334
  10. '''

利用ResNet进行图像分类

ResNet残差网络是一种深度神经网络架构,它通过引入残差连接,允许网络在训练过程中跳过某些层,从而缓解了深层网络中的梯度消失问题。

ResNet网络架构如下:

网络中的亮点如下:

其中网络中的Residual块为,主分支的输出矩阵与输入矩阵相加,所以图片的高,宽和通道必须相同,并非拼接。

而且,对于传统的神经网络,并不是层数越深效果越好,如下图,56层的神经网络可能要比20层的神经网络要差很多,传统的神经网络面临梯度消失或梯度爆炸,以及退化问题;而ResNet网络不会面临这些问题,ResNet随着层数的增加,模型的损失会越来越小。

而且Residual结构,使用1*1的卷积核用来降维和升维,参数的大小计算公式为输入深度*输出深度*卷积核大小,最后在相加,我们发现使用1*1的卷积核用于降维所需要的参数要少很多,如下图所示:

其中,Residual结构通过shortcut连接,输入矩阵和主分支的输出矩阵相加,输入矩阵和主分支的输出矩阵的维度也都相同;若输入矩阵和主分支的输出矩阵的维度不相同,为了保证主分支输出矩阵和输入矩阵维度相同,在其引入虚线,增加1*1的卷积核,用于升维和调整高,宽,使得主分支输出矩阵和输入矩阵的维度相同。

Batch Normalization(批量归一化)

批量归一化在图像预处理过程对图像进行标准化处理,这样可以加速网络的收敛,批量归一化使得图片的每一个维度满足均值为0,方差为1的分布,且在实际应用中通常在卷积层和relu层中间使用批量归一化。

注:是调整一批数据的分布,并不是调整一个

如下图,有两个特征两个通道,进行求出每个特征的均值和方差,注意是一批数据同一个通道所有数据的均值和方差,最后通过计算公式计算出特征矩阵。

参考链接: Batch Normalization详解以及pytorch实验_pytorch batch normalization-CSDN博客

迁移学习

使用迁移学习,可以快速的训练出一个理想的结果,并且,当数据集较小时也能训练出理想的效果。

迁移学习学习的是网络通用的一些特征和信息,如下图,前面的是通用的信息,是底层通用的能力,迁移学习将其底层的权重迁移,可以快速训练出理想的结果。

并且,常见的迁移学习方式如下:

且2,3训练会更快,1训练的结果更精确。

代码实现

ResNet的架构如下:

在代码实现之前,我们要先要了解ResNet网络block的两种形式。

对于ResNet18和ResNet34网络,它们的网络的两种形式如下:第一种形式是输入的通道数目与输出的通道数目相同;第二种形式是输入的通道数目与输出的通道数目不同,且高宽减半,通道数目翻倍。

对于ResNet50,ResNet101,ResNet152网络,它们的网络的两种形式如下:第一种形式是输入的通道数目与输出的通道数目相同;第二种形式是输入的通道数目与输出的通道数目不同,且对于resnet架构中的conv2这一层的第一个block是只改变通道数目,不改变高和宽;对于resnet架构中的其它层是既改变通道数目又改变高和宽。

1、在了解到ResNet网络block的两种形式后,我们利用代码首先实现ResNet18,ResNet34网络的block结构,代码如下:

代码中的expansion代表卷积核的变化,对于这两个网络而言,每个block最后的输出通道都与block最开始的输出通道相同,所以expansion为1;downsample表示下采样,表示在经过高宽和通道变化时调用downsample函数。

  1. class BasicBlock(nn.Module):
  2. expansion = 1
  3. def __init__(self, in_channel, out_channel, stride=1, downsample=None):
  4. super(BasicBlock, self).__init__()
  5. self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=3, stride=stride,
  6. padding=1, bias=False) # 不使用偏置
  7. self.bn1 = nn.BatchNorm2d(out_channel)
  8. self.relu = nn.ReLU(inplace=True)
  9. self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=1, padding=1,
  10. bias=False)
  11. self.bn2 = nn.BatchNorm2d(out_channel)
  12. self.downsample = downsample # 下采样方法
  13. def forward(self, x):
  14. identity = x
  15. if self.downsample is not None: # 若下采样为None,则为实线,不需要对输入做变化
  16. identity = self.downsample(x)
  17. out = self.conv1(x)
  18. out = self.bn1(out)
  19. out = self.relu(out)
  20. out = self.conv2(out)
  21. out = self.bn2(out)
  22. out += identity
  23. out = self.relu(out)
  24. return out

2、现在我们利用代码首先实现ResNet50,ResNet101,ResNet152网络的block结构,代码如下:

代码中的expansion为4,代表每个block最后的输出通道都是block最开始的输出通道的4倍,也就是第三层卷积核个数是第一层,第二层的4倍,且比上面两个网络的不同是,首先经过1*1的卷积核进行降维,再经过3*3的卷积核提取特征,最后经过1*1的卷积核用于升维;downsample表示下采样,表示在经过高宽和通道变化时调用downsample函数。

  1. class Bottleneck(nn.Module):
  2. expansion = 4 # 卷积核的变化
  3. def __init__(self, in_channel, out_channel, stride=1, downsample=None):
  4. super(Bottleneck, self).__init__()
  5. self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, bias=False)
  6. self.bn1 = nn.BatchNorm2d(out_channel)
  7. self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=stride,
  8. padding=1, bias=False)
  9. self.bn2 = nn.BatchNorm2d(out_channel)
  10. self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, kernel_size=1,
  11. stride=1, bias=False)
  12. self.bn3 = nn.BatchNorm2d(out_channel)
  13. self.relu = nn.ReLU(inplace=True)
  14. self.downsample = downsample
  15. def forward(self, x):
  16. identity = x
  17. if self.downsample is not None:
  18. identity = self.downsample(x)
  19. out = self.conv1(x)
  20. out = self.bn1(out)
  21. out = self.relu(out)
  22. out = self.conv2(out)
  23. out = self.bn2(out)
  24. out = self.relu(out)
  25. out = self.conv3(out)
  26. out = self.bn3(out)
  27. out += identity
  28. out = self.relu(out)
  29. return out

3、在实现ResNet的block结构后,现在实现ResNet网络的定义,代码如下:

其中代码中的include_top代表是否进行微调网络,在这里默认为True;_make_layer函数实现了ResNet网络的第二种形式的变换,里面的channel是残差结构中卷积层使用卷积核的个数,对于每个ResNet18,34和ResNet50,101,152网络,最后的输出通道数目都不相同。

并且一定要注意,对于ResNet50,101,152网络的layer1的第一个block是只改变通道,不改变高和宽,其他层的第一个block都是要改变通道和高,宽,也就是ResNet的block的第二种形式,除了第一个block,该层的其他block都是第一种形式,最后的输出通道数目与输入通道数目相同;对于ResNet18,34网络的layer1是不改变高宽,也不改变通道数目,其他层的第一个block都是通道数目翻倍,高宽减半的操作。

  1. class ResNet(nn.Module):
  2. def __init__(self, block, block_num, num_classes=1000, include_top=True):
  3. super(ResNet, self).__init__()
  4. self.include_top = include_top
  5. self.in_channel = 64 # 通过max pool之后的通道数目
  6. self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channel, kernel_size=7, stride=2, padding=3,
  7. bias=False)
  8. self.bn1 = nn.BatchNorm2d(self.in_channel)
  9. self.relu = nn.ReLU(inplace=True)
  10. self.maxPool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
  11. self.layer1 = self._make_layer(block, 64, block_num[0])
  12. self.layer2 = self._make_layer(block, 128, block_num[1], stride=2)
  13. self.layer3 = self._make_layer(block, 256, block_num[2], stride=2)
  14. self.layer4 = self._make_layer(block, 512, block_num[3], stride=2)
  15. if self.include_top:
  16. self.avgPool = nn.AdaptiveAvgPool2d((1, 1))
  17. self.fc = nn.Linear(512 * block.expansion, num_classes)
  18. for m in self.modules():
  19. if isinstance(m, nn.Conv2d):
  20. nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
  21. def _make_layer(self, block, channel, block_num, stride=1):
  22. downsample = None
  23. if stride != 1 or self.in_channel != channel * block.expansion:
  24. downsample = nn.Sequential(
  25. nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
  26. nn.BatchNorm2d(channel * block.expansion),
  27. )
  28. layers = []
  29. layers.append(block(self.in_channel, channel, stride, downsample))
  30. self.in_channel = channel * block.expansion
  31. for _ in range(1, block_num):
  32. layers.append(block(self.in_channel, channel))
  33. return nn.Sequential(*layers)
  34. def forward(self, x):
  35. x = self.conv1(x)
  36. x = self.bn1(x)
  37. x = self.relu(x)
  38. x = self.maxPool(x)
  39. x = self.layer1(x)
  40. x = self.layer2(x)
  41. x = self.layer3(x)
  42. x = self.layer4(x)
  43. if self.include_top:
  44. x = self.avgPool(x)
  45. x = torch.flatten(x, 1)
  46. x = self.fc(x)
  47. return x

4、其中定义ResNet34和ResNet101函数,返回这面定义的ResNet网络,代码如下:

其中3,4,6,3等数字代表该层的block数量;num_classes代表最后分类的数量

  1. def ResNet34(num_classes=1000, include_top=True):
  2. return ResNet(BasicBlock, [3, 4, 6, 3], num_classes, include_top)
  3. def ResNet101(num_classes=1000, include_top=True):
  4. return ResNet(Bottleneck, [3, 4, 23, 3], num_classes, include_top)

5、进行训练,前面的数据预处理,训练数据集,测试数据集与之前一样,其中ResNet34的预训练参数是在pytorch官网中下载得到的,下载链接为https://download.pytorch.org/models/resnet34-b627a593.pth,之后将预训练的参数加载到网络中,记得要改模型最后的输出的特征数量,因为ResNet默认的输出特征数量为1000

  1. net = ResNet34()
  2. # 加载预训练参数
  3. model_weight_path = './resnet34-pre.pth'
  4. missing_keys, unexpected_keys = net.load_state_dict(torch.load(model_weight_path), strict=False)
  5. print(f'missing_keys: {missing_keys}, unexpected_keys: {unexpected_keys}')
  6. # 改变输出的特征数量
  7. in_channel = net.fc.in_features
  8. net.fc = nn.Linear(in_features=in_channel, out_features=5)
  9. net.to(device)

6、进行模型训练的代码与前一致

  1. '''
  2. train epoch [1/5], loss: 0.5164: 100%|██████████| 207/207 [00:33<00:00, 6.10it/s]
  3. 100%|██████████| 23/23 [00:03<00:00, 7.45it/s]
  4. epoch: 1, train loss: 0.5003, val acc: 0.8901
  5. save model to /kaggle/working/Resnet34.pth
  6. train epoch [2/5], loss: 0.6781: 100%|██████████| 207/207 [00:24<00:00, 8.28it/s]
  7. 100%|██████████| 23/23 [00:02<00:00, 10.96it/s]
  8. epoch: 2, train loss: 0.3391, val acc: 0.9258
  9. save model to /kaggle/working/Resnet34.pth
  10. train epoch [3/5], loss: 0.3169: 100%|██████████| 207/207 [00:24<00:00, 8.38it/s]
  11. 100%|██████████| 23/23 [00:02<00:00, 11.15it/s]
  12. epoch: 3, train loss: 0.2870, val acc: 0.8984
  13. train epoch [4/5], loss: 0.1521: 100%|██████████| 207/207 [00:24<00:00, 8.29it/s]
  14. 100%|██████████| 23/23 [00:01<00:00, 11.52it/s]
  15. epoch: 4, train loss: 0.2592, val acc: 0.9203
  16. train epoch [5/5], loss: 0.6599: 100%|██████████| 207/207 [00:24<00:00, 8.29it/s]
  17. 100%|██████████| 23/23 [00:02<00:00, 10.61it/s]
  18. epoch: 5, train loss: 0.2376, val acc: 0.9093
  19. Finished Training
  20. '''

7、训练后进行预测,加载模型的参数,预测一张图片

  1. model = ResNet34(num_classes=5)
  2. model_weight_path = './ResNet34.pth'
  3. model.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
  4. model.eval()
  5. with torch.no_grad():
  6. output = torch.squeeze(model(img))
  7. predict = torch.softmax(output, dim=-1)
  8. cla_indict = torch.argmax(predict).numpy()
  9. print(class_indict[str(cla_indict)], predict[cla_indict])
  10. '''
  11. tulips 0.9997768998146057
  12. '''

利用ResNext进行图像分类

ResNeXt 是一种深度卷积神经网络架构,它在 ResNet 的基础上引入了额外的创新。

网络亮点:与ResNet相比,更新了block,并且ResNeXt 引入了分组卷积,网络的卷积层在计算时将输入特征图分成多个组,并在每个组上进行卷积操作,最后将这些卷积结果合并,这种方式可以降低计算复杂度,同时保持较好的特征表示能力。

如下图,左侧为ResNet的block架构,右侧为ResNext的block的架构。

并且ResNext101在输入尺寸为224*224的条件下,比原始的ResNet101,ResNet200的效果都要好,ResNext的网络效果得到了提升

ResNext50也要比ResNet50所需的参数会更少

  • ResNext引入了分组卷积,降低了计算复杂度,减少了参数的数量,如下图,在上方是普通的卷积操作,假设输入的特征矩阵为Cin个通道,有n个卷积核,当然每个卷积核也有Cin个通道,最后的输出的特征矩阵通道数目为n,它的参数个数的计算公式为每个卷积核的高*每个卷积核的宽*输入通道数目*输出通道数目,设每个卷积核的高和宽都为k,所以普通卷积的参数个数为k*k*Cin*n。
  • 而组卷积的过程如下,假设输入的特征矩阵有Cin个通道,将这Cin个通道分为g个组,其中最后的输出特征矩阵的通道数目为n,则每个组的卷积核个数为n/g,所以每个组的输出特征矩阵的参数个数为k*k*Cin/g(每个组的输入通道数目)*n/g(每个组的输出通道数目),又因为有g个组,最后组卷积的参数个数为k*k*Cin*n*1/g。
  • 而只要分组的个数不为1,那么最后所需要的参数个数就一定比普通的卷积操作的要更少;同样的,若分组的数量等于输入的通道个数,输入的通道个数等于输出的通道个数,这就相当于对我们输入特征矩阵的每一个通道分配了一个通道为1的卷积核进行卷积,此时就是DW Conv。

  • ResNext的block架构如下,以下a,b,c三种block架构都是等同的,最精简的block架构是c,最详细的block架构是a。
  • 在这个架构中,若输入矩阵的通道数目为256,首先分为32个组,所以每个组所需的通道数目为8,每个组经过1*1的卷积核,输出通道数目为4,这也达到了降维,此时输出的通道数目为128。
  • 在每个组的输出通道数目为4后,这32个组,每个组再经过3*3的卷积核,输出通道还为4。
  • 最后,在每个组的输出通道数目为4后,每个组经过1*1的卷积核,输出通道为256,将每个组的输出特征矩阵相加,这等同于实现了b的concatenate的拼接操作,将此次的输出特征矩阵与输入的特征矩阵相加后,完成此次block的操作。

那为什么组卷积中组的数量要设置为32呢,关于这个问题,在文章的实验中有所提到,经过实验后,发现当组的数量设置为32时,对于ResNext50和ResNext101这两个网络,得到的损失最低。

最后,文章提到,当block中的层数小于3时,将不会起到降低复杂度,参数个数的作用,因为与普通卷积最后的结果相同,数学计算是一样的。

代码实现

ResNext引入了分组卷积,网络的卷积层在计算时将输入特征图分成多个组,并在每个组上进行卷积操作,最后将这些卷积结果合并,这种方式可以降低计算复杂度,同时保持较好的特征表示能力。

ResNext网络只能应用在ResNet50,101,152网络的基础上,因为ResNext的论文中提到,当block中的层数小于3时,将不会起到降低复杂度,参数个数的作用。

ResNet50月ResNext50的网络架构如下:

我们发现,ResNet50和ResNext50在每层的block数量上没有变化,变化的只是每一层最开始的通道数量,以及在每个block中间的分组卷积,所以,只需要在原来ResNet的代码中添加组的数量以及网络中每个分组卷积的宽度width_per_group。

其中每一层最开始的通道数量的公式为width = int(out_channel * (width_per_group / 64)) * groups,当组的数量为32,宽度为4时,最后的结果为2倍的out_channel,这个也是ResNext网络的超参数。

1、定义Bottleneck类,它代表ResNext网络的block

其中引入了组的数量和网络中每个分组卷积的宽度,若不传入其他值,那么组的数量为1,每个分组卷积的宽度为64,则和ResNet网络的效果一样,不进行分组

  1. class Bottleneck(nn.Module):
  2. expansion = 4
  3. def __init__(self, in_channel, out_channel, stride=1, downsample=None, groups=1, width_per_group=64): # 32 4
  4. super(Bottleneck, self).__init__()
  5. width = int(out_channel * (width_per_group / 64)) * groups # groups=32,width_per_group=4,width=out_channel*2
  6. self.conv1 = nn.Conv2d(in_channel, width, kernel_size=1, bias=False)
  7. self.bn1 = nn.BatchNorm2d(width)
  8. self.conv2 = nn.Conv2d(width, width, groups=groups, kernel_size=3, stride=stride, padding=1, bias=False)
  9. self.bn2 = nn.BatchNorm2d(width)
  10. self.conv3 = nn.Conv2d(width, out_channel * self.expansion, kernel_size=1, bias=False)
  11. self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
  12. self.relu = nn.ReLU(inplace=True)
  13. self.downsample = downsample
  14. def forward(self, x):
  15. identity = x
  16. if self.downsample is not None:
  17. identity = self.downsample(x)
  18. out = self.conv1(x)
  19. out = self.bn1(out)
  20. out = self.relu(out)
  21. out = self.conv2(out)
  22. out = self.bn2(out)
  23. out = self.relu(out)
  24. out = self.conv3(out)
  25. out = self.bn3(out)
  26. out += identity
  27. out = self.relu(out)
  28. return out

2、定义ResNext类,加入了分组数量,其他的与ResNet网络一样

  1. class ResNeXt(nn.Module):
  2. def __init__(self, block, blocks_num, num_classes=1000, include_top=True, groups=1, width_per_group=64):
  3. super(ResNeXt, self).__init__()
  4. self.include_top = include_top
  5. self.in_channel = 64
  6. self.groups = groups
  7. self.width_per_group = width_per_group
  8. self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
  9. self.bn1 = nn.BatchNorm2d(self.in_channel)
  10. self.relu = nn.ReLU(inplace=True)
  11. self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
  12. self.layer1 = self._make_layer(block, 64, blocks_num[0])
  13. self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
  14. self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
  15. self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
  16. if self.include_top:
  17. self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
  18. self.fc = nn.Linear(512 * block.expansion, num_classes)
  19. for m in self.modules():
  20. if isinstance(m, nn.Conv2d):
  21. nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
  22. def _make_layer(self, block, channel, block_num, stride=1):
  23. downsample = None
  24. if stride != 1 or self.in_channel != channel * block.expansion:
  25. downsample = nn.Sequential(
  26. nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
  27. nn.BatchNorm2d(channel * block.expansion),
  28. )
  29. layers = []
  30. layers.append(block(in_channel=self.in_channel, out_channel=channel, stride=stride, downsample=downsample,
  31. groups=self.groups, width_per_group=self.width_per_group))
  32. self.in_channel = channel * block.expansion
  33. for _ in range(1, block_num):
  34. layers.append(block(self.in_channel, channel, groups=self.groups, width_per_group=self.width_per_group))
  35. return nn.Sequential(*layers)
  36. def forward(self, x):
  37. x = self.conv1(x)
  38. x = self.bn1(x)
  39. x = self.relu(x)
  40. x = self.maxpool(x)
  41. x = self.layer1(x)
  42. x = self.layer2(x)
  43. x = self.layer3(x)
  44. x = self.layer4(x)
  45. if self.include_top:
  46. x = self.avgpool(x)
  47. x = torch.flatten(x, 1)
  48. x = self.fc(x)
  49. return x

3、定义ResNext50和ResNext101网络,将分组数量和每组的卷积核个数传递给类中,将类实例化为ResNext对象,并返回

  1. def resnext50_32x4d(num_classes=1000, include_top=True):
  2. groups = 32
  3. width_per_group = 4
  4. return ResNeXt(Bottleneck, [3, 4, 6, 3], num_classes, include_top, groups, width_per_group)
  5. def resnext101_32x8d(num_classes=1000, include_top=True):
  6. groups = 32
  7. width_per_group = 8
  8. return ResNeXt(Bottleneck, [3, 4, 23, 3], num_classes, include_top, groups, width_per_group)

4、进行训练,前面的数据预处理,训练数据集,测试数据集与之前一样,不一样的是将ResNext网络实例化,并冻结除全连接层之外的所有权重

首先先下载ResNext50的预训练权重,链接:https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth

  1. net = resnext50_32x4d()
  2. model_weight_path = './resnext50_pre.pth'
  3. net.load_state_dict(torch.load(model_weight_path))
  4. for param in net.parameters():
  5. param.requires_grad = False
  6. in_channel = net.fc.in_features
  7. net.fc = nn.Linear(in_features=in_channel, out_features=5)
  8. net.to(device)

5、只训练全连接层的参数

  1. params = [p for p in net.parameters() if p.requires_grad]
  2. optimizer = torch.optim.Adam(params, lr=0.0001)

6、进行模型训练的代码与前一致

  1. '''
  2. train epoch [1/10], loss: 0.3461: 100%|██████████| 207/207 [00:21<00:00, 9.76it/s]
  3. 100%|██████████| 23/23 [00:02<00:00, 8.59it/s]
  4. epoch: 1, train loss: 0.5362, val acc: 0.8709
  5. save model to /kaggle/working/Resnext50.pth
  6. train epoch [2/10], loss: 0.8194: 100%|██████████| 207/207 [00:21<00:00, 9.50it/s]
  7. 100%|██████████| 23/23 [00:02<00:00, 9.08it/s]
  8. epoch: 2, train loss: 0.5200, val acc: 0.8846
  9. save model to /kaggle/working/Resnext50.pth
  10. train epoch [3/10], loss: 0.5423: 100%|██████████| 207/207 [00:21<00:00, 9.67it/s]
  11. 100%|██████████| 23/23 [00:02<00:00, 9.61it/s]
  12. epoch: 3, train loss: 0.5047, val acc: 0.8929
  13. save model to /kaggle/working/Resnext50.pth
  14. train epoch [4/10], loss: 0.4425: 100%|██████████| 207/207 [00:21<00:00, 9.73it/s]
  15. 100%|██████████| 23/23 [00:02<00:00, 9.40it/s]
  16. epoch: 4, train loss: 0.4673, val acc: 0.8901
  17. train epoch [5/10], loss: 0.9804: 100%|██████████| 207/207 [00:21<00:00, 9.51it/s]
  18. 100%|██████████| 23/23 [00:02<00:00, 9.52it/s]
  19. epoch: 5, train loss: 0.4588, val acc: 0.9011
  20. save model to /kaggle/working/Resnext50.pth
  21. train epoch [6/10], loss: 0.4052: 100%|██████████| 207/207 [00:21<00:00, 9.80it/s]
  22. 100%|██████████| 23/23 [00:02<00:00, 9.24it/s]
  23. epoch: 6, train loss: 0.4590, val acc: 0.8874
  24. train epoch [7/10], loss: 0.3596: 100%|██████████| 207/207 [00:21<00:00, 9.66it/s]
  25. 100%|██████████| 23/23 [00:02<00:00, 9.22it/s]
  26. epoch: 7, train loss: 0.4366, val acc: 0.9038
  27. save model to /kaggle/working/Resnext50.pth
  28. train epoch [8/10], loss: 0.5366: 100%|██████████| 207/207 [00:21<00:00, 9.66it/s]
  29. 100%|██████████| 23/23 [00:02<00:00, 9.57it/s]
  30. epoch: 8, train loss: 0.4463, val acc: 0.8874
  31. train epoch [9/10], loss: 0.6280: 100%|██████████| 207/207 [00:21<00:00, 9.49it/s]
  32. 100%|██████████| 23/23 [00:02<00:00, 9.39it/s]
  33. epoch: 9, train loss: 0.4193, val acc: 0.9121
  34. save model to /kaggle/working/Resnext50.pth
  35. train epoch [10/10], loss: 0.4213: 100%|██████████| 207/207 [00:21<00:00, 9.67it/s]
  36. 100%|██████████| 23/23 [00:02<00:00, 8.85it/s]
  37. epoch: 10, train loss: 0.4153, val acc: 0.8984
  38. Finished Training
  39. '''

7、训练后进行预测,加载模型的参数,预测一张图片

  1. model = resnext50_32x4d(num_classes=5)
  2. model_weight_path = './ResNext50.pth'
  3. model.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
  4. model.eval()
  5. with torch.no_grad():
  6. output = model(img)
  7. output = torch.squeeze(output)
  8. predict = torch.softmax(output,dim=-1)
  9. idx = torch.argmax(predict,dim=-1).item()
  10. print('img class: {}, predict class: {:.4f}'.format(class_indices[str(idx)],predict[idx]))
  11. '''
  12. img class: tulips, predict class: 0.9963
  13. '''

8、若预测一批图片,需要将这些图片合成一个列表,并打包成一个batch

  1. img_path_list = ['tulip.jpg', 'rose.jpg']
  2. img_list = []
  3. for img_path in img_path_list:
  4. assert os.path.exists(img_path), f'file {img_path} does not exist'
  5. img = Image.open(img_path)
  6. img = data_transform(img)
  7. img_list.append(img)
  8. batch_img = torch.stack(img_list, dim=0)
  9. print('batch_img shape:', batch_img.shape)
  10. '''
  11. batch_img shape: torch.Size([2, 3, 224, 224])
  12. '''

9、预测一批图片时,加载模型参数与前一致,最后利用循环进行输出每次预测的图片种类和每次预测的概率

  1. model.eval()
  2. with torch.no_grad():
  3. outputs = model(batch)
  4. predict = torch.softmax(outputs, dim=-1)
  5. idx_list = torch.argmax(predict, dim=-1).numpy()
  6. for step, idx in enumerate(idx_list):
  7. print('image_path: {}, image_class: {}, image_predict: {:.4f}'.format(image_path_list[step], class_indices[str(idx)],
  8. predict[step][idx]))
  9. '''
  10. image_path: ../ResNext/tulip.jpg, image_class: tulips, image_predict: 0.9963
  11. image_path: ../ResNext/rose.jpg, image_class: roses, image_predict: 0.9509
  12. '''

MobileNet

MobileNet 是一种高效的卷积神经网络架构,专门设计用于在移动设备和边缘设备上进行高效的计算和推理。

MobileNet v1

MobileNet v1 网络的亮点如下:

MobileNet的核心是使用深度可分离卷积,传统的卷积操作是卷积核通道数目等于输入特征矩阵的通道数目,输出特征矩阵的通道等于卷积核的个数;而DW卷积将传统的卷积操作分解为更轻量的操作,它对每个通道独立的进行卷积,每个卷积核的通道数目都为1,且输入特征矩阵的通道等于卷积核的个数等于输出矩阵的通道数目,相当于传统卷积当卷积核个数为1时,不进行合并的操作。

PW卷积是在DW卷积之后,像传统卷积一样,每个卷积核有输入特征矩阵的通道数目,并且每个都是1*1的卷积核,卷积核的数目是最终的输出通道数目。

并且传统卷积最后的输出通道数目和DW+PW卷积最后的输出通道数目都是一致的,且DW+PW卷积所需的计算量会更少。

计算量的公式为:(kernel*kernel*map*map)*channel_input*channel_output。

其中kernel为卷积核的的大小,map为输出特征矩阵的大小,channel_input为输入特征矩阵的深度,channel_output为输出特征矩阵的深度。

设输入特征矩阵的高宽为DF,卷积核的大小为DK,M为输入矩阵的深度,N为输出特征矩阵的深度,也是卷积核的个数。

所以,对于普通卷积的计算量为:DK*DK*M*N*DF*DF(其中这里默认stride为1,最后不改变输出特征矩阵的大小)

对于DW+PW卷积的计算量为:

DK*DK*M*DF*DF+M*N*DF*DF(其中这里默认stride为1,最后不改变输出特征矩阵的大小)

如下图,理论上普通卷积计算量是DW+PW的8到9倍

MobileNet V1的架构如下:

其中α代表卷积核个数的倍率,β代表分辨率,也就是输入尺寸,经过实验验证,MobileNet的的精确度相对于VGG16虽然差一些,但是它所需的计算量和所需的参数数量比VGG16要少的很多。

MobileNet v2

MobileNet v2网络相比MobileNet V1网络,准确率更高,模型更小,它网络中的亮点如下:

在ResNet中引入了残差结构,它是对图片先进行降维,进行卷积后,再进行升维,属于两头大,中间小的瓶颈结构,它所用的激活函数为Relu。

对于MobileNet v2中引入了倒残差结构,它是对图片先进行升维,进行DW卷积后,再进行降维,属于两头小,中间大的结果,它所用的激活函数为Relu6。

Relu6激活函数的图像如下:对于普通的Relu激活函数,当输入值小于0时,结果为0,当输入值大于0时,不进行处理;Relu6激活函数是当输入值小于0时,结果为0,当输入值位于0和6之间,不进行处理,当输入值大于6,结果为6。

且在MobileNet v2中的每个block的最后一个1*1的卷积核所用的激活函数为Linear激活函数,因为Relu激活函数对低维特征信息造成大量的损失。

MobileNet v2的block如下:

假设输入的形状为高宽分别为h和w,输入通道数目为k,首先经过1*1的卷积核升维,输出形状为h*w*(tk),其中t为扩展因子;接下来经过3*3的卷积核进行DW卷积,DW卷积不会改变输入的通道数目,所以输出形状为h/s*w/s*(tk),其中s为步距;最后再经过1*1的卷积核用于降维,输出形状为h/s*w/s*k',其中k'为最后指定的输出通道,高宽不变。

我们要注意,当stride=1时,输入特征矩阵与输出特征矩阵的shape相同时,才有shortcut连接,这里才可以进行相交操作,若stride=2,形状大小不一致,并没有ResNet中的输入特征矩阵的变化,则没有shortcut,不能进行连接。

MobileNet v2的架构如下:

经过实验的验证,MobileNet v2相比MobileNet v1和其他网络而言,在图片分类和目标检测测试上,准确率和模型所需的参数都优于其他的网络。

个人总结

本周主要学习了一些图像处理的方法和理论,以及各种网络实现图像分类,下周将继续学习其他的一些模型算法和理论知识,并且阅读相应的文献,理论与实践相结合。

声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号