赞
踩
本文主要是参照 B 站 UP 主 霹雳吧啦Wz 的视频学习笔记,参考的相关资料在文末参照栏给出,包括实现代码和文中用的一些图片。
整个工程已经上传个人的 github https://github.com/lovewinds13/QYQXDeepLearning ,下载即可直接测试,数据集文件因为比较大,已经删除了,按照下文教程下载即可。
论文下载:Very Deep Convolutional Networks For Large-Scale Image Recognition
VGG 网络是 2014 年由牛津大学著名研究组 VGG (Visual Geometry Group) 提出,斩获该年 ImageNet 竞赛中 Localization Task (定位任务) 第一名 和 Classification Task (分类任务) 第二名。
VGG 主要是卷积神经网络的深度大规模图像识别精度的影响,使用 3 x 3 的卷积核构建各种深度的卷积神经网络结构,减少了所需的参数。常用的深度是 VGG16 和 VGG 19。
VGG 不同深度的卷积层数配置如下:
VGG 的特点:
(1) 输入为 224 x 224 x 3 的图像;
(2) 卷积层步长 (stride = 1) ,补位 (padding = 1) ;
(3) 最大池化 (maxpool) 尺寸 (size = 2),步长 (stride = 2),5 个最大池化层 ;
(4) 连续的卷积层后接池化层,降低图像的分辨率;
(5) 相比于ALexNet,卷积深度更深;
(6) VGG 的结构一致,卷积核尺寸 3 x 3,池化核尺寸 2 x 2,不断叠加;
(7) VGG 参数更多,计算耗费计算资源;
定义: 卷积网络中,决定某一层输出结果中一个元素所对应输入层区域的大小,感受野用来表示网络内部的不同神经元对原图像感受范围的大小,即每一层输出的特征图 (feature map) 上的像素点在原始图像上映射的区域大小。
可以看待,通过两次 3x3 的卷积与一次 5x5 的卷积,都最终输出一个特征图。
针对两次 3x3 的卷积,经过第一次卷积之后,输出 3x3 的特征图,每个像素点对应的上一层 3*3 的区域,即感受野大小为 3,再经过第二次 3x3 的卷积,输出为 1x1 的特征图,这个像素点对应原始图像 5x5 的区域,即感受野为 5。
对 5x5 的图像,经过一次 5x5 的卷积核,同样输出 1x1 特征图,它的感受野也是 5。
感受野计算:
堆叠两个3x3的卷积核替代5x5的卷积核,假设输入、输出通道为 C。
卷积核尺寸 | 参数 |
---|---|
5 * 5 | 5 * 5 * C * C = 25C^2 |
3 * 3 | 3 * 3 * C * C + 3 * 3 * C * C = 18C^2 |
在输出相同的特征图下,小卷积核的参数明显小于大卷积核。
本文使用花分类数据集,下载链接: 花分类数据集——http://download.tensorflow.org/example_images/flower_photos.tgz
数据集划分参考这个pytorch图像分类篇:3.搭建AlexNet并训练花分类数据集
在 CPU 训练的基础上,为了修改为 GPU 训练,因此单独修改了一个文件 train_gpu.py。
""" VGG模型 """ import torch.nn as nn import torch # official pretrain weights model_urls = { 'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth', 'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth', 'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth', 'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth' } cfgs = { 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'], } class VGG(nn.Module): def __init__(self, features, num_classes=1000, init_weights=False): super(VGG, self).__init__() self.features = features self.classifier = nn.Sequential( nn.Linear(512*7*7, 2048), # 第1线性层, 2048 减少参数 nn.ReLU(True), nn.Dropout(p=0.5), nn.Linear(2048, 2048), # 第2线性层 nn.ReLU(True), nn.Dropout(p=0.5), nn.Linear(2048, num_classes), # 第3线性层 ) if init_weights: self._initialize_weights() def forward(self, x): x = self.features(x) # N x 3 x 224 x 224 x = torch.flatten(x, start_dim=1) # N x 512 x 7 x 7 x = self.classifier(x) # N x 512*7*7 return x def _initialize_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.xavier_uniform_(m.weight) if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.Linear): nn.init.xavier_uniform_(m.weight) nn.init.constant_(m.bias, 0) def make_features(cfg: list): layers = [] in_channels = 3 for v in cfg: if v == "M": layers += [nn.MaxPool2d(kernel_size=2, stride=2)] else: conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) layers += [conv2d, nn.ReLU(True)] in_channels = v return nn.Sequential(*layers) def vgg(model_name="vgg16", **kwargs): assert model_name in cfgs, "Warning: model number {} not in cfgs dist!".format(model_name) cfg = cfgs[model_name] model = VGG(make_features(cfg), **kwargs) return model """ 测试模型 """ # if __name__ == '__main__': # input1 = torch.rand([224, 3, 224, 224]) # model_name = "vgg16" # model_x = vgg(model_name=model_name, num_classes=5, init_weights=True) # print(model_x) # output = AlexNet(input1)
此处的网络模型仅使用了一半的参数,即线性层的 2048,减少参数。
直接使用 torch.flatten 方法展平张量
x = torch.flatten(x, start_dim=1) # 二维平坦
""" 训练(CPU) """ import os import sys import json import time import torch import torch.nn as nn from torchvision import transforms, datasets, utils import matplotlib.pyplot as plt import numpy as np import torch.optim as optim from tqdm import tqdm # 显示进度条模块 from model import vgg
data_transform = {
"train": transforms.Compose([
transforms.RandomResizedCrop(224), # 随机裁剪, 再缩放为 224*224
transforms.RandomHorizontalFlip(), # 水平随机翻转
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
]),
"val": transforms.Compose([
transforms.Resize((224, 224)), # 元组(224, 224)
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
}
# data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # 读取数据路径
data_root = os.path.abspath(os.path.join(os.getcwd(), "./"))
image_path = os.path.join(data_root, "data_set", "flower_data")
# image_path = data_root + "/data_set/flower_data/"
assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
此处相比于 UP 主教程,修改了读取路径。
train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
transform=data_transform["train"]
)
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=nw
)
val_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
transform=data_transform["val"]
)
val_num = len(val_dataset)
val_loader = torch.utils.data.DataLoader(val_dataset,
batch_size=4,
shuffle=False,
num_workers=nw
)
flower_list = train_dataset.class_to_idx
cla_dict = dict((val, key) for key, val in flower_list.items())
json_str = json.dumps(cla_dict, indent=4)
with open("calss_indices.json", 'w') as json_file:
json_file.write(json_str)
model_name = "vgg16" net = vgg(model_name=model_name, num_classes=5, init_weights=True) # 实例化网络(5分类) # net.to(device) net.to("cpu") # 直接指定 cpu loss_function = nn.CrossEntropyLoss() # 交叉熵损失 optimizer = optim.Adam(net.parameters(), lr=0.0001) # 优化器(训练参数, 学习率) epochs = 10 # 训练轮数 save_path = "./VGGNet.pth" best_accuracy = 0.0 train_steps = len(train_loader) for epoch in range(epochs): net.train() # 开启Dropout running_loss = 0.0 train_bar = tqdm(train_loader, file=sys.stdout) # 设置进度条图标 for step, data in enumerate(train_bar): # 遍历训练集, images, labels = data # 获取训练集图像和标签 optimizer.zero_grad() # 清除历史梯度 outputs = net(images) # 正向传播 loss = loss_function(outputs, labels) # 计算损失值 loss.backward() # 方向传播 optimizer.step() # 更新优化器参数 running_loss += loss.item() train_bar.desc = "train epoch [{}/{}] loss:{:.3f}".format(epoch + 1, epochs, loss ) # 验证 net.eval() # 关闭Dropout acc = 0.0 with torch.no_grad(): val_bar = tqdm(val_loader, file=sys.stdout) for val_data in val_bar: val_images, val_labels = val_data outputs = net(val_images) predict_y = torch.max(outputs, dim=1)[1] acc += torch.eq(predict_y, val_labels).sum().item() val_accuracy = acc / val_num print("[epoch %d ] train_loss: %3f val_accurancy: %3f" % (epoch + 1, running_loss / train_steps, val_accuracy)) if val_accuracy > best_accuracy: # 保存准确率最高的 best_accuracy = val_accuracy torch.save(net.state_dict(), save_path) print("Finshed Training.")
训练过程可视化信息输出:
GPU 训练代码: 仅在 CPU 训练的基础上做了数据转换处理。
""" 训练(GPU) """ import os import sys import json import time import torch import torch.nn as nn from torchvision import transforms, datasets, utils import matplotlib.pyplot as plt import numpy as np import torch.optim as optim from tqdm import tqdm from model import vgg def main(): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"use device is {device}") data_transform = { "train": transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]), "val": transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]) } # data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # 读取数据路径 data_root = os.path.abspath(os.path.join(os.getcwd(), "./")) image_path = os.path.join(data_root, "data_set", "flower_data") # image_path = data_root + "/data_set/flower_data/" assert os.path.exists(image_path), "{} path does not exist.".format(image_path) train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"), transform=data_transform["train"] ) train_num = len(train_dataset) flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) json_str = json.dumps(cla_dict, indent=4) with open("calss_indices.json", 'w') as json_file: json_file.write(json_str) batch_size = 32 nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # 线程数计算 nw = 0 print(f"Using {nw} dataloader workers every process.") train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=nw ) val_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"), transform=data_transform["val"] ) val_num = len(val_dataset) val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=4, shuffle=False, num_workers=nw ) print(f"Using {train_num} images for training, {val_num} images for validation.") # test_data_iter = iter(val_loader) # test_image, test_label = next(test_data_iter) """ 测试数据集图片""" # def imshow(img): # img = img / 2 + 0.5 # np_img = img.numpy() # plt.imshow(np.transpose(np_img, (1, 2, 0))) # plt.show() # print(' '.join('%5s' % cla_dict[test_label[j].item()] for j in range(4))) # imshow(utils.make_grid(test_image)) model_name = "vgg16" net = vgg(model_name=model_name, num_classes=5, init_weights=True) # 实例化网络(5分类) net.to(device) loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.0001) epochs = 10 save_path = "./VGGNet.pth" best_accuracy = 0.0 train_steps = len(train_loader) for epoch in range(epochs): net.train() running_loss = 0.0 train_bar = tqdm(train_loader, file=sys.stdout) for step, data in enumerate(train_bar): images, labels = data optimizer.zero_grad() outputs = net(images.to(device)) loss = loss_function(outputs, labels.to(device)) loss.backward() optimizer.step() running_loss += loss.item() train_bar.desc = "train epoch [{}/{}] loss:{:.3f}".format(epoch + 1, epochs, loss ) # 验证 net.eval() acc = 0.0 with torch.no_grad(): val_bar = tqdm(val_loader, file=sys.stdout) for val_data in val_bar: val_images, val_labels = val_data outputs = net(val_images.to(device)) predict_y = torch.max(outputs, dim=1)[1] acc += torch.eq(predict_y, val_labels.to(device)).sum().item() val_accuracy = acc / val_num print("[epoch %d ] train_loss: %3f val_accurancy: %3f" % (epoch + 1, running_loss / train_steps, val_accuracy)) if val_accuracy > best_accuracy: best_accuracy = val_accuracy torch.save(net.state_dict(), save_path) print("Finshed Training.")
VGG 通过 GPU 训练对显存要求比较高。
""" 预测 """ import os import json import torch from PIL import Image from torchvision import transforms import matplotlib.pyplot as plt from model import vgg def main(): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") data_transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) image_path = "./daisy01.jpg" img = Image.open(image_path) plt.imshow(img) img = data_transform(img) # [N, C H, W] img = torch.unsqueeze(img, dim=0) # 维度扩展 # print(f"img={img}") json_path = "./calss_indices.json" with open(json_path, 'r') as f: class_indict = json.load(f) # model = AlexNet(num_classes=5).to(device) # GPU model = vgg(model_name="vgg16", num_classes=5) # CPU weights_path = "./VGGNet.pth" model.load_state_dict(torch.load(weights_path)) model.eval() # 关闭 Dorpout with torch.no_grad(): # output = torch.squeeze(model(img.to(device))).cpu() #GPU output = torch.squeeze(model(img)) # 维度压缩 predict = torch.softmax(output, dim=0) predict_cla = torch.argmax(predict).numpy() print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)], predict[predict_cla].numpy()) plt.title(print_res) # for i in range(len(predict)): # print("class: {} prob: {:.3}".format(class_indict[str(predict_cla)], # predict[predict_cla].numpy())) plt.show()
预测结果如下:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。