当前位置:   article > 正文

VGG网络结构详解以及感受野的计算_vgg全连接

vgg全连接

 网络结构的亮点:

通过堆叠多个3*3的卷积核来替代大尺度卷积核(减少所需参数)

论文中提到,可以通过堆叠两个3*3的卷积核替代5*5的卷积核,堆叠三个3*3的卷积核替代7*7的卷积核

拥有相同的感受野

什么是感受野呢?

卷积神经网络中,决定某一层输出结果中    一个元素   所对应的输入层的区域大小

被称做感受野

通俗的解释是,输出feature map(特征矩阵)上的一个单元对应输入层上区域的大小

 (9-3+2*0)/2+1=4

(4-2+2*0)/2+1=2

第三层(最上面一层)中一个单元在第二层的感受野就是一个2*2区域,在第一层的感受野就是一个5*5大小

感受野计算公式:
F(i) = (F(i+1)-1) * stride+Ksize

F(i)为第i层感受野

F(i+1)表示上一层的感受野

stride 为第i层步距

Ksize为卷积核或池化核尺寸

Feature map:F=1

Pool1:F=(1-1)*2+2=2

Conv1:F=(2-1)*2+3=5

采用三个3*3的卷积核来替代一个7*7的卷积核原理

feature map : F=1

Conv3*3(3):F = (1-1)*1+3=3

Conv3*3(2):F = (3-1)*1+3=5

Conv3*3(1):F = (5-1)*1+3=7

这样特征矩阵的一个单元通过三个3*3卷积核的堆叠,所对应的感受野与一个7*7卷积核对应的感受野是一样大小的,都是7*7大小。

 先大概看一下网络结构:

首先导入一张224*224大小的RGB图片,

然后是两个3*3的卷积层,然后最大下采样

再通过两个3*3的卷积层,然后最大下采样

然后三个3*3的卷积层,然后最大下采样

然后三个3*3的卷积层,然后最大下采样

然后三个3*3的卷积层,然后最大下采样

然后三个全连接层

softmax处理

一共13层卷积层 加上 3个全连接层

其中卷积的步距为1 padding为1

通过卷积操作,特征图的高度和宽度是不变的     N=(3-3+2*1)/1+1=3 输入是3*3 输出还是3*3

池化核的大小为2  步距也为2

通过池化操作,只将原来特征矩阵的高和宽转变成原来的一半 ,不改变深度

卷积核的个数是输出特征图的深度

三层全连接层:

前两个全连接层用了Relu激活函数

最后一层的1000个节点是不需要Relu激活函数,最后是通过softmax层进行激活

网络实现

将VGG网络分成两个部分:

第一个部分是提取特征网络结构(全连接层之前的模块)

第二个部分是分类网络结构(三层全连接层)

网络结构模块

  1. import torch.nn as nn
  2. import torch
  3. # official pretrain weights
  4. model_urls = {
  5. 'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
  6. 'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
  7. 'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
  8. 'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth'
  9. }
  10. class VGG(nn.Module):
  11. def __init__(self, features, num_classes=1000, init_weights=False):
  12. super(VGG, self).__init__()
  13. self.features = features
  14. self.classifier = nn.Sequential(#三层全连接层
  15. nn.Linear(512*7*7, 4096),#展平操作
  16. nn.ReLU(True),
  17. nn.Dropout(p=0.5),
  18. nn.Linear(4096, 4096),
  19. nn.ReLU(True),
  20. nn.Dropout(p=0.5),
  21. nn.Linear(4096, num_classes)
  22. )
  23. if init_weights:
  24. self._initialize_weights()
  25. def forward(self, x):
  26. # N x 3 x 224 x 224
  27. x = self.features(x)
  28. # N x 512 x 7 x 7
  29. x = torch.flatten(x, start_dim=1)
  30. # N x 512*7*7
  31. x = self.classifier(x)
  32. return x
  33. def _initialize_weights(self):
  34. for m in self.modules():#遍历网络的每一个子模块,也就是网络的每一层
  35. if isinstance(m, nn.Conv2d):#如果是卷积层
  36. # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
  37. nn.init.xavier_uniform_(m.weight)#用xavier方法初始化卷积核的权重
  38. if m.bias is not None:#如果采用偏置
  39. nn.init.constant_(m.bias, 0)#默认初始化为0
  40. elif isinstance(m, nn.Linear):#如果是全连接层instance(实例)
  41. nn.init.xavier_uniform_(m.weight)
  42. # nn.init.normal_(m.weight, 0, 0.01)
  43. nn.init.constant_(m.bias, 0)
  44. def make_features(cfg: list):#提取特征网络,传入的就是配置变量,是list类型
  45. layers = [] #空列表,用来存放我们创建的每一层
  46. in_channels = 3
  47. for v in cfg:
  48. if v == "M":
  49. layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
  50. else:
  51. conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
  52. layers += [conv2d, nn.ReLU(True)]#将卷积层和Relu函数拼接在一起
  53. in_channels = v #输出之后特征矩阵的深度就是卷积核的个数
  54. return nn.Sequential(*layers)#Sequential(连续的)将列表通过非关键字参数的形式传入进去
  55. cfgs = {#字典文件 每一个key都对应模型的配置文件 vgg11对应的就是A配置 11层结构的网络
  56. 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],#数字64这些代表卷积核的个数,M代表池化层
  57. 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],#对应B配置 13层
  58. 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],#对应D配置 16层
  59. 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],#对应E配置,19层
  60. }
  61. def vgg(model_name="vgg16", **kwargs):#vgg函数,实例化网络
  62. assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name)
  63. cfg = cfgs[model_name]
  64. model = VGG(make_features(cfg), **kwargs)#VGG类 **对应的是可变长度的字典变量
  65. return model

网络训练模块

  1. import os
  2. import sys
  3. import json
  4. import torch
  5. import torch.nn as nn
  6. from torchvision import transforms, datasets
  7. import torch.optim as optim
  8. from tqdm import tqdm
  9. from model import vgg
  10. def main():
  11. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  12. print("using {} device.".format(device))
  13. data_transform = {
  14. "train": transforms.Compose([transforms.RandomResizedCrop(224),
  15. transforms.RandomHorizontalFlip(),
  16. transforms.ToTensor(),#tensor格式
  17. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
  18. "val": transforms.Compose([transforms.Resize((224, 224)),
  19. transforms.ToTensor(),
  20. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}
  21. data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # get data root path
  22. image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path
  23. assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
  24. train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
  25. transform=data_transform["train"])
  26. train_num = len(train_dataset)
  27. # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
  28. flower_list = train_dataset.class_to_idx
  29. cla_dict = dict((val, key) for key, val in flower_list.items())
  30. # write dict into json file
  31. json_str = json.dumps(cla_dict, indent=4)
  32. with open('class_indices.json', 'w') as json_file:
  33. json_file.write(json_str)
  34. batch_size = 32
  35. nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
  36. print('Using {} dataloader workers every process'.format(nw))
  37. train_loader = torch.utils.data.DataLoader(train_dataset,
  38. batch_size=batch_size, shuffle=True,
  39. num_workers=0)
  40. validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
  41. transform=data_transform["val"])
  42. val_num = len(validate_dataset)
  43. validate_loader = torch.utils.data.DataLoader(validate_dataset,
  44. batch_size=batch_size, shuffle=False,
  45. num_workers=0)
  46. print("using {} images for training, {} images for validation.".format(train_num,
  47. val_num))
  48. # test_data_iter = iter(validate_loader)
  49. # test_image, test_label = test_data_iter.next()
  50. model_name = "vgg16"
  51. net = vgg(model_name=model_name, num_classes=5, init_weights=True)
  52. net.to(device)
  53. loss_function = nn.CrossEntropyLoss()
  54. optimizer = optim.Adam(net.parameters(), lr=0.0001)
  55. epochs = 3
  56. best_acc = 0.0
  57. save_path = './{}Net.pth'.format(model_name)
  58. train_steps = len(train_loader)
  59. for epoch in range(epochs):
  60. # train
  61. net.train()
  62. running_loss = 0.0
  63. train_bar = tqdm(train_loader, file=sys.stdout)
  64. for step, data in enumerate(train_bar):
  65. images, labels = data
  66. optimizer.zero_grad()
  67. outputs = net(images.to(device))
  68. loss = loss_function(outputs, labels.to(device))
  69. loss.backward()
  70. optimizer.step()
  71. # print statistics
  72. running_loss += loss.item()
  73. train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
  74. epochs,
  75. loss)
  76. # validate
  77. net.eval()
  78. acc = 0.0 # accumulate accurate number / epoch
  79. with torch.no_grad():
  80. val_bar = tqdm(validate_loader, file=sys.stdout)
  81. for val_data in val_bar:
  82. val_images, val_labels = val_data
  83. outputs = net(val_images.to(device))
  84. predict_y = torch.max(outputs, dim=1)[1]
  85. acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
  86. val_accurate = acc / val_num
  87. print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' %
  88. (epoch + 1, running_loss / train_steps, val_accurate))
  89. if val_accurate > best_acc:
  90. best_acc = val_accurate
  91. torch.save(net.state_dict(), save_path)
  92. print('Finished Training')
  93. if __name__ == '__main__':
  94. main()

训练结果

 预测模块

  1. import os
  2. import json
  3. import torch
  4. from PIL import Image
  5. from torchvision import transforms
  6. import matplotlib.pyplot as plt
  7. from model import vgg
  8. def main():
  9. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  10. data_transform = transforms.Compose(
  11. [transforms.Resize((224, 224)),
  12. transforms.ToTensor(),
  13. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
  14. # load image
  15. img_path = "../tulip.jpg"
  16. assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
  17. img = Image.open(img_path)
  18. plt.imshow(img)
  19. # [N, C, H, W]
  20. img = data_transform(img)
  21. # expand batch dimension
  22. img = torch.unsqueeze(img, dim=0)
  23. # read class_indict
  24. json_path = './class_indices.json'
  25. assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
  26. with open(json_path, "r") as f:
  27. class_indict = json.load(f)
  28. # create model
  29. model = vgg(model_name="vgg16", num_classes=5).to(device)
  30. # load model weights
  31. weights_path = "./vgg16Net.pth"
  32. assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
  33. model.load_state_dict(torch.load(weights_path, map_location=device))
  34. model.eval()
  35. with torch.no_grad():
  36. # predict class
  37. output = torch.squeeze(model(img.to(device))).cpu()
  38. predict = torch.softmax(output, dim=0)
  39. predict_cla = torch.argmax(predict).numpy()
  40. print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
  41. predict[predict_cla].numpy())
  42. plt.title(print_res)
  43. for i in range(len(predict)):
  44. print("class: {:10} prob: {:.3}".format(class_indict[str(i)],
  45. predict[i].numpy()))
  46. plt.show()
  47. if __name__ == '__main__':
  48. main()

预测结果

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/340830
推荐阅读
相关标签
  

闽ICP备14008679号