当前位置:   article > 正文

深度学习图片分类模型总结(附部分代码)_深度学习 图片分类模型

深度学习 图片分类模型

参考视频:2.1 pytorch官方demo(Lenet)_哔哩哔哩_bilibili

LeNet(1998)

pytorch 示例

笔记

  1. Pytorch Tensor的通道排序:[batch,channel,height,width]

  2. CIFAR10 dataset: It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

  3. 为什么每计算一个batch就需要调用一次optimizer.zero_grad():

    1. PyTorch中在反向传播前为什么要手动将梯度清零? - 知乎
  4. pytorch官网:-docs可查看各个函数的用法,-tutorials可以查看示例

  5. data = torch.max(outputs, dim=1), then data will be a tuple containing two tensors.

    1. data[0]: This tensor will contain the maximum values along dimension 1 (the class dimension) of the outputs tensor. It will have a shape of [batch], where batch is the number of validation images in the batch.
    2. data[1]: This tensor will contain the indices of the maximum values along dimension 1 (the class dimension) of the outputs tensor. It will also have a shape of [batch], where each element represents the predicted class label (index) for each validation image in the batch.
  6. get the size of a tensor along a specific dimension, you use the method size() or the property shape[].

    1. import torch
    2. # Assuming val_label is a tensor with shape [batch_size, ...]
    3. # Using size() method
    4. size_along_first_dim = val_label.size(0)
    5. # Using shape property
    6. size_along_first_dim = val_label.shape[0]

model

  1. import torch.nn as nn
  2. import torch.nn.functional as F
  3. class LeNet(nn.Module):
  4. def __init__(self):
  5. super(LeNet, self).__init__()
  6. self.conv1 = nn.Conv2d(3, 16, 5)
  7. self.pool1 = nn.MaxPool2d(2, 2)
  8. self.conv2 = nn.Conv2d(16, 32, 5)
  9. self.pool2 = nn.MaxPool2d(2, 2)
  10. self.fc1 = nn.Linear(32*5*5, 120)
  11. self.fc2 = nn.Linear(120, 84)
  12. self.fc3 = nn.Linear(84, 10)
  13. def forward(self, x):
  14. x = F.relu(self.conv1(x)) # input(3, 32, 32) output(16, 28, 28)
  15. x = self.pool1(x) # output(16, 14, 14)
  16. x = F.relu(self.conv2(x)) # output(32, 10, 10)
  17. x = self.pool2(x) # output(32, 5, 5)
  18. x = x.view(-1, 32*5*5) # output(32*5*5)
  19. x = F.relu(self.fc1(x)) # output(120)
  20. x = F.relu(self.fc2(x)) # output(84)
  21. x = self.fc3(x) # output(10)
  22. return x

train

  1. import torch
  2. import torchvision
  3. import torch.nn as nn
  4. from model import LeNet
  5. import torch.optim as optim
  6. import torchvision.transforms as transforms
  7. def main():
  8. '''
  9. ToTensor:Converts a PIL Image or numpy.ndarray (H x W x C) in the range
  10. [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
  11. Normalize: a tensor image with mean and standard deviation.
  12. Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
  13. channels, this transform will normalize each channel of the input ``torch.*Tensor`` i.e.,
  14. ``output[channel] = (input[channel] - mean[channel]) / std[channel]
  15. '''
  16. transform = transforms.Compose(
  17. [transforms.ToTensor(),
  18. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
  19. # 50000张训练图片
  20. # 第一次使用时要将download设置为True才会自动去下载数据集
  21. train_set = torchvision.datasets.CIFAR10(root='./data', train=True,
  22. download=False, transform=transform)
  23. #shuffle:是否打乱数据集
  24. train_loader = torch.utils.data.DataLoader(train_set, batch_size=36,
  25. shuffle=True, num_workers=0)
  26. # 10000张验证图片
  27. # 第一次使用时要将download设置为True才会自动去下载数据集
  28. val_set = torchvision.datasets.CIFAR10(root='./data', train=False,
  29. download=False, transform=transform)
  30. val_loader = torch.utils.data.DataLoader(val_set, batch_size=10000,
  31. shuffle=False, num_workers=0)
  32. val_data_iter = iter(val_loader) #将val_loader转化为一个可迭代的迭代器
  33. val_image, val_label = next(val_data_iter) #获取图片和标签值
  34. classes = ('plane', 'car', 'bird', 'cat',
  35. 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
  36. def imshow(img):
  37. img = img / 2 + 0.5 # unnormalize 返标准化
  38. npimg = img.numpy()
  39. plt.imshow(np.transpose(npimg, (1, 2, 0)))
  40. plt.show()
  41. # print labels
  42. print(' '.join(f'{classes[val_label[j]]:5s}' for j in range(4)))
  43. # show images
  44. imshow(torchvision.utils.make_grid(val_image))
  45. net = LeNet()
  46. loss_function = nn.CrossEntropyLoss()
  47. optimizer = optim.Adam(net.parameters(), lr=0.001)
  48. for epoch in range(5): # loop over the dataset multiple times
  49. running_loss = 0.0
  50. for step, data in enumerate(train_loader, start=0):
  51. '''
  52. step最多为50000/36
  53. 50000是因为该数据集的训练集有50000张图片,36是在下载数据集时设置的batch_size
  54. '''
  55. # get the inputs; data is a list of [inputs, labels]
  56. inputs, labels = data
  57. # zero the parameter gradients
  58. optimizer.zero_grad()
  59. # forward + backward + optimize
  60. outputs = net(inputs) #正向传播计算输出
  61. loss = loss_function(outputs, labels)#计算loss
  62. loss.backward()#反向传播
  63. optimizer.step()#参数更新
  64. # print statistics
  65. running_loss += loss.item()
  66. if step % 500 == 499: # print every 500 mini-batches
  67. '''
  68. %:这是模运算符,计算step除以500时的余数。
  69. 使用数字 499 而不是 500,以确保在第 500 次迭代之后立即执行操作,而不是在第 501 次迭代之后执行。
  70. '''
  71. with torch.no_grad():
  72. #在接下来的过程中不计算梯度,没有这一行的话在测试过程中也会计算误差损失梯度,会占用很多资源
  73. outputs = net(val_image) # [batch, 10]
  74. predict_y = torch.max(outputs, dim=1)[1]
  75. accuracy = torch.eq(predict_y, val_label).sum().item() / val_label.size(0)
  76. print('[%d, %5d] train_loss: %.3f test_accuracy: %.3f' %
  77. (epoch + 1, step + 1, running_loss / 500, accuracy))
  78. running_loss = 0.0
  79. print('Finished Training')
  80. save_path = './Lenet.pth'
  81. torch.save(net.state_dict(), save_path)
  82. if __name__ == '__main__':
  83. main()

AlexNet(2012)

 

 

VGG(2014)

笔记

  1. 在train时,如果是基于初始化权重进行迁移学习,需要在数据预处理时先将图片减去[123.68,116.78,103.94],这是imagenet的所有图片的三通道的均值,如果是从头训练,不需要减去

  2. num_workers:线程数,windows系统只能为0

model

  1. import torch.nn as nn
  2. import torch
  3. # official pretrain weights
  4. model_urls = {
  5. 'vgg11': '<https://download.pytorch.org/models/vgg11-bbd30ac9.pth>',
  6. 'vgg13': '<https://download.pytorch.org/models/vgg13-c768596a.pth>',
  7. 'vgg16': '<https://download.pytorch.org/models/vgg16-397923af.pth>',
  8. 'vgg19': '<https://download.pytorch.org/models/vgg19-dcbb9e9d.pth>'
  9. }
  10. class VGG(nn.Module):
  11. def __init__(self, features, num_classes=1000, init_weights=False):
  12. super(VGG, self).__init__()
  13. self.features = features
  14. self.classifier = nn.Sequential(
  15. nn.Linear(512*7*7, 4096),
  16. nn.ReLU(True),
  17. nn.Dropout(p=0.5),
  18. nn.Linear(4096, 4096),
  19. nn.ReLU(True),
  20. nn.Dropout(p=0.5),
  21. nn.Linear(4096, num_classes)
  22. )
  23. if init_weights:
  24. self._initialize_weights()
  25. def forward(self, x):
  26. # N x 3 x 224 x 224
  27. x = self.features(x)
  28. # N x 512 x 7 x 7
  29. x = torch.flatten(x, start_dim=1)
  30. # N x 512*7*7
  31. x = self.classifier(x)
  32. return x
  33. def _initialize_weights(self):
  34. for m in self.modules():
  35. if isinstance(m, nn.Conv2d):
  36. # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
  37. nn.init.xavier_uniform_(m.weight)
  38. if m.bias is not None:
  39. nn.init.constant_(m.bias, 0)
  40. elif isinstance(m, nn.Linear):
  41. nn.init.xavier_uniform_(m.weight)
  42. # nn.init.normal_(m.weight, 0, 0.01)
  43. nn.init.constant_(m.bias, 0)
  44. def make_features(cfg: list):
  45. layers = []
  46. in_channels = 3
  47. for v in cfg:
  48. if v == "M":
  49. layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
  50. else:
  51. conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
  52. layers += [conv2d, nn.ReLU(True)]
  53. in_channels = v
  54. return nn.Sequential(*layers)#非关键字参数
  55. #数字:卷积核个数 'M':池化层
  56. cfgs = {
  57. 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],#A
  58. 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],#B
  59. 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],#D
  60. 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],#E
  61. }
  62. def vgg(model_name="vgg16", **kwargs):
  63. assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name)
  64. cfg = cfgs[model_name]
  65. model = VGG(make_features(cfg), **kwargs)
  66. return model

 GoogLeNet(2014)

 ResNet(2015)

 

ViT

补充知识

代码

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/506558
推荐阅读
相关标签
  

闽ICP备14008679号