赞
踩
目录
以下是VGG家族成员
其性能如下:
什么是top-1跟top-5?、
top1就是你预测的label取最后概率向量里面最大的那一个作为预测结果,你的预测结果中概率最大的那个类必须是正确类别才算预测正确。
而top5就是最后概率向量最大的前五名中出现了正确概率即为预测正确。
Top-1和Top-5 error 是深度学习中评价模型预测错误率的两个指标,在VGG论文中是这样解释这两个指标的:
The former is a multi-class classification error, i.e. the proportion of incorrectly classified images; the latter is the main evaluation criterion used in ILSVRC, and is computed as the proportion of images such that the ground-truth category is outside the top-5 predicted categories.
Top-1 error 的意思是:假如模型预测某张动物图片(一只猫)的类别,且模型只输出1个预测结果,那么这一个结果正好能猜出来这个动物是只猫的概率就是Top-1正确率。猜出来的结果不是猫的概率则成为Top-1错误率。简单来说就是模型猜错的概率。
Top-5 error 的意思是:假如模型预测某张动物图片(还是刚才那只猫),但模型会输出来5个预测结果,那么这五个结果中有猫这个分类的概率成为Top-5正确率,相反,预测输出的这五个结果里没有猫这个分类的概率则成为Top-5错误率。一般来说,Top-1和Top-5错误率越低,模型的性能也就越好。且Top-5 error 在数值上会比Top-1 error 的数值要小,毕竟从1个结果猜对的几率总会比从5个结果里猜对的几率要小!
经过 速度跟精度发现VGG16和VGG19是最优化的层
VGG16是一种深度卷积神经网络模型,用于图像分类和识别任务。它是由牛津大学的研究团队开发的,命名为Visual Geometry Group(VGG),并在2014年的ImageNet图像识别挑战中取得了很好的成绩。
VGG16模型具有13个卷积层和3个全连接层,总共有约138百万个可训练参数。该模型的核心思想是通过堆叠多个小尺寸的卷积核和池化层来增加网络的深度,从而提高图像特征的表示能力。它采用了相对较小的3x3卷积核和2x2最大池化核,每个卷积层后都使用了ReLU激活函数。
VGG16的结构相对简单而经典,是深度学习中常用的基准模型之一。它在图像分类任务中表现出色,能够有效地识别和区分不同的物体类别。由于其简单的结构和可扩展性,VGG16也常被用作迁移学习的基础模型,在各种计算机视觉任务中发挥重要作用,如目标检测、图像分割等。
以下是VGG16的网络结构
- Layer (type) Output Shape Param #
- =================================================================
- input_1 (InputLayer) (None, 32, 32, 3) 0
- _________________________________________________________________
- conv1_1 (Conv2D) (None, 32, 32, 64) 1792
- _________________________________________________________________
- conv1_2 (Conv2D) (None, 32, 32, 64) 36928
- _________________________________________________________________
- batch_normalization_1 (Batch (None, 32, 32, 64) 256
- _________________________________________________________________
- max_pooling2d_1 (MaxPooling2 (None, 16, 16, 64) 0
- _________________________________________________________________
- dropout_1 (Dropout) (None, 16, 16, 64) 0
- _________________________________________________________________
- conv2_1 (Conv2D) (None, 16, 16, 128) 73856
- _________________________________________________________________
- conv2_2 (Conv2D) (None, 16, 16, 128) 147584
- _________________________________________________________________
- batch_normalization_2 (Batch (None, 16, 16, 128) 512
- _________________________________________________________________
- max_pooling2d_2 (MaxPooling2 (None, 8, 8, 128) 0
- _________________________________________________________________
- dropout_2 (Dropout) (None, 8, 8, 128) 0
- _________________________________________________________________
- conv3_1 (Conv2D) (None, 8, 8, 256) 295168
- _________________________________________________________________
- conv3_2 (Conv2D) (None, 8, 8, 256) 590080
- _________________________________________________________________
- conv3_3 (Conv2D) (None, 8, 8, 256) 590080
- _________________________________________________________________
- batch_normalization_3 (Batch (None, 8, 8, 256) 1024
- _________________________________________________________________
- max_pooling2d_3 (MaxPooling2 (None, 4, 4, 256) 0
- _________________________________________________________________
- dropout_3 (Dropout) (None, 4, 4, 256) 0
- _________________________________________________________________
- conv4_1 (Conv2D) (None, 4, 4, 512) 1180160
- _________________________________________________________________
- conv4_2 (Conv2D) (None, 4, 4, 512) 2359808
- _________________________________________________________________
- conv4_3 (Conv2D) (None, 4, 4, 512) 2359808
- _________________________________________________________________
- batch_normalization_4 (Batch (None, 4, 4, 512) 2048
- _________________________________________________________________
- max_pooling2d_4 (MaxPooling2 (None, 2, 2, 512) 0
- _________________________________________________________________
- dropout_4 (Dropout) (None, 2, 2, 512) 0
- _________________________________________________________________
- conv5_1 (Conv2D) (None, 2, 2, 512) 2359808
- _________________________________________________________________
- conv5_2 (Conv2D) (None, 2, 2, 512) 2359808
- _________________________________________________________________
- conv5_3 (Conv2D) (None, 2, 2, 512) 2359808
- _________________________________________________________________
- batch_normalization_5 (Batch (None, 2, 2, 512) 2048
- _________________________________________________________________
- max_pooling2d_5 (MaxPooling2 (None, 1, 1, 512) 0
- _________________________________________________________________
- dropout_5 (Dropout) (None, 1, 1, 512) 0
- _________________________________________________________________
- flatten_1 (Flatten) (None, 512) 0
- _________________________________________________________________
- dense_1 (Dense) (None, 4096) 2101248
- _________________________________________________________________
- activation_1 (Activation) (None, 4096) 0
- _________________________________________________________________
- dropout_6 (Dropout) (None, 4096) 0
- _________________________________________________________________
- dense_2 (Dense) (None, 10) 40970
- _________________________________________________________________
- activation_2 (Activation) (None, 10) 0
- =================================================================
VGG16由5层卷积层、3层全连接层、softmax输出层构成,层与层之间使用max-pooling(最大化池)分开,所有隐层的激活单元都采用ReLU函数,如图所示
输入图像尺寸为224x224x3,经64个通道为3的3x3的卷积核,步长为1,padding=same填充,卷积两次,再经ReLU激活,输出的尺寸大小为224x224x64
经max pooling(最大化池化),滤波器为2x2,步长为2,图像尺寸减半,池化后的尺寸变为112x112x64
经128个3x3的卷积核,两次卷积,ReLU激活,尺寸变为112x112x128
max pooling池化,尺寸变为56x56x128
经256个3x3的卷积核,三次卷积,ReLU激活,尺寸变为56x56x256
max pooling池化,尺寸变为28x28x256
经512个3x3的卷积核,三次卷积,ReLU激活,尺寸变为28x28x512
max pooling池化,尺寸变为14x14x512
经512个3x3的卷积核,三次卷积,ReLU,尺寸变为14x14x512
max pooling池化,尺寸变为7x7x512
然后Flatten(),将数据拉平成向量,变成一维51277=25088。
再经过两层1x1x4096,一层1x1x1000的全连接层(共三层),经ReLU激活
最后通过softmax输出1000个预测结果
AlexNet中的每个卷积层只包含一个卷积,卷积核的大小为7.7,在VGGNet中,每个卷积层包含2-4个卷积操作。卷积核的大小为3.3,卷积步长为1,池核为2*2,步长为2。VGGNET最明显的改进是减小卷积核的大小,增加卷积层的数目。
以下是对整个网络架构的逐层次分析
采用多个卷积层,用较小的卷积核代替具有卷积核的较大卷积层,一方面可以减少参数,而且作者认为它等价于更多的非线性映射,提高了拟合的表达能力。
如果按照上面的网络架构来写代码,一层一层往下递进,所示就是该网络模型
- class _VGG16_(nn.Module):
-
- def __init__(self):
- super(_VGG16_, self).__init__()
- self.conv1_1 = nn.Conv2d(3, 64, 3)
- self.conv1_2 = nn.Conv2d(64, 64, 3, stride=1, padding=1) # 假设输入图像的尺寸为7*224*224
- self.max_pooling_1 = nn.MaxPool2d(2, stride=2, padding=1) # 112 * 64 * 64
-
- self.conv2_1 = nn.Conv2d(64, 128, 3)
- self.conv2_2 = nn.Conv2d(128, 128, 3, stride=1, padding=1)
- self.max_pooling_2 = nn.MaxPool2d(2, stride=2, padding=1) # 56 * 128 * 128
-
- self.conv3_1 = nn.Conv2d(128, 256, 3)
- self.conv3_2 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
- self.conv3_3 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
- self.max_pooling_3 = nn.MaxPool2d(2, stride=2, padding=1) # 28 * 256 * 256
-
- self.conv4_1 = nn.Conv2d(256, 512, 3)
- self.conv4_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.conv4_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.max_pooling_4 = nn.MaxPool2d(2, stride=2, padding=1) # 14 * 512 * 512
-
- self.conv5_1 = nn.Conv2d(512, 512, 3)
- self.conv5_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.conv5_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.max_pooling_5 = nn.MaxPool2d(2, stride=2, padding=1) # 7 * 512 * 512
-
- self.fc1 = nn.Linear(7 * 7 * 512, 4096)
- self.fc2 = nn.Linear(4096, 4096)
- self.fc3 = nn.Linear(4096, 10)
-
- def forward(self, x):
- x = self.conv1_1(x)
- x = F.relu(x)
- x = self.conv1_2(x)
- x = F.relu(x)
- x = self.max_pooling_1(x)
-
- x = self.conv2_1(x)
- x = F.relu(x)
- x = self.conv2_2(x)
- x = F.relu(x)
- x = self.max_pooling_2(x)
-
- x = self.conv3_1(x)
- x = F.relu(x)
- x = self.conv3_2(x)
- x = F.relu(x)
- x = self.conv3_3(x)
- x = F.relu(x)
- x = self.max_pooling_3(x)
-
- x = self.conv4_1(x)
- x = F.relu(x)
- x = self.conv4_2(x)
- x = F.relu(x)
- x = self.conv4_3(x)
- x = F.relu(x)
- x = self.max_pooling_4(x)
-
- x = self.conv5_1(x)
- x = F.relu(x)
- x = self.conv5_2(x)
- x = F.relu(x)
- x = self.conv5_3(x)
- x = F.relu(x)
- x = self.max_pooling_5(x)
-
- x = x.view(-1, 7 * 7 * 512)
- x = self.fc1(x)
- x = F.relu(x)
- x = self.fc2(x)
- x = F.relu(x)
- x = self.fc3(x)
-
- x = F.softmax(x)
-
- return x
使用tensflow的完整代码,使用已经存在的VGG16的模型进行训练
- import tensorflow as tf
- from tensorflow.keras.datasets import cifar10
- from tensorflow.keras.applications import VGG16
- from tensorflow.keras.layers import Dense, Flatten
- from tensorflow.keras.models import Model
- from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
-
- # 下载并加载CIFAR-10数据集
- (x_train, y_train), (x_test, y_test) = cifar10.load_data()
-
- # 对数据进行预处理,将像素值缩放到0到1之间
- x_train = x_train / 255.0
- x_test = x_test / 255.0
-
- # 构建VGG16模型
- base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
-
- # 冻结VGG16的权重
- for layer in base_model.layers:
- layer.trainable = False
-
- # 在VGG16之上添加自定义的全连接层
- x = Flatten()(base_model.output)
- x = Dense(512, activation='relu')(x)
- x = Dense(10, activation='softmax')(x)
-
- # 创建新的模型
- model = Model(base_model.input, x)
-
- # 编译模型
- model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
-
- # 设置回调函数
- checkpoint = ModelCheckpoint('vgg16_model.h5', save_best_only=True, save_weights_only=False, monitor='val_accuracy', mode='max')
- tensorboard = TensorBoard(log_dir='./logs', histogram_freq=1)
-
- # 训练模型
- history = model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test), callbacks=[checkpoint, tensorboard])
-
- # 打印训练结果
- print("训练集准确率:", history.history['accuracy'][-1])
- print("验证集准确率:", history.history['val_accuracy'][-1])
-
- # 保存模型
- model.save('vgg16_model.h5')
准确率有点低,可以自行调整参数
如下是使用训练好的参数去进行预测
- import tensorflow as tf
- import numpy as np
- from tensorflow.keras.datasets import cifar10
- from tensorflow.keras.applications.vgg16 import preprocess_input
- from tensorflow.keras.models import load_model
- from PIL import Image
-
- # 加载CIFAR-10数据集的类别标签
- class_labels = [
- '狗', '蛙', '马', '船', '卡车'
- '飞机', '汽车', '鸟', '猫', '鹿',
- ]
-
- # 加载训练好的VGG16模型
- model = load_model('vgg16_model.h5')
-
- # 加载待分类的图像
- image_path = 'path' # 替换为你自己的图像路径
- image = Image.open(image_path)
- image = image.resize((32, 32)) # 将图像调整为与训练数据相同的尺寸
- image = np.array(image)
- image = preprocess_input(image) # 预处理图像数据
-
- # 执行图像分类
- predictions = model.predict(np.expand_dims(image, axis=0))
- predicted_class_index = np.argmax(predictions)
- predicted_class_label = class_labels[predicted_class_index]
-
- # 输出预测结果
- print("预测标签:", predicted_class_label)
参考:使用pytorch构建基于VGG16的网络实现Cifar10分类_vgg16 pytorch代码_shrinco的博客-CSDN博客
【深度学习笔记】Top-5/1错误率_深度学习error率实验-CSDN博客
VGGNet-16 架构:完整指南 |卡格尔 (kaggle.com)Keras:什么是 VGG16 和 VGG19? #DeepLearning - QiitaVGGNet-16 架构:完整指南 |卡格尔 (kaggle.com)
本文仅作为学习笔记使用
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。