Top-1和Top-5 error 是深度学习中评价模型预测错误率的两个指标,在VGG论文中是这样解释这两个指标的:
The former is a multi-class classification error, i.e. the proportion of incorrectly classified images; the latter is the main evaluation criterion used in ILSVRC, and is computed as the proportion of images such that the ground-truth category is outside the top-5 predicted categories.
Top-1 error 的意思是:假如模型预测某张动物图片(一只猫)的类别,且模型只输出1个预测结果,那么这一个结果正好能猜出来这个动物是只猫的概率就是Top-1正确率。猜出来的结果不是猫的概率则成为Top-1错误率。简单来说就是模型猜错的概率。
Top-5 error 的意思是:假如模型预测某张动物图片(还是刚才那只猫),但模型会输出来5个预测结果,那么这五个结果中有猫这个分类的概率成为Top-5正确率,相反,预测输出的这五个结果里没有猫这个分类的概率则成为Top-5错误率。一般来说,Top-1和Top-5错误率越低,模型的性能也就越好。且Top-5 error 在数值上会比Top-1 error 的数值要小,毕竟从1个结果猜对的几率总会比从5个结果里猜对的几率要小!
经过 速度跟精度发现VGG16和VGG19是最优化的层
VGG16是一种深度卷积神经网络模型,用于图像分类和识别任务。它是由牛津大学的研究团队开发的,命名为Visual Geometry Group(VGG),并在2014年的ImageNet图像识别挑战中取得了很好的成绩。
- Layer (type) Output Shape Param #
- =================================================================
- input_1 (InputLayer) (None, 32, 32, 3) 0
- _________________________________________________________________
- conv1_1 (Conv2D) (None, 32, 32, 64) 1792
- _________________________________________________________________
- conv1_2 (Conv2D) (None, 32, 32, 64) 36928
- _________________________________________________________________
- batch_normalization_1 (Batch (None, 32, 32, 64) 256
- _________________________________________________________________
- max_pooling2d_1 (MaxPooling2 (None, 16, 16, 64) 0
- _________________________________________________________________
- dropout_1 (Dropout) (None, 16, 16, 64) 0
- _________________________________________________________________
- conv2_1 (Conv2D) (None, 16, 16, 128) 73856
- _________________________________________________________________
- conv2_2 (Conv2D) (None, 16, 16, 128) 147584
- _________________________________________________________________
- batch_normalization_2 (Batch (None, 16, 16, 128) 512
- _________________________________________________________________
- max_pooling2d_2 (MaxPooling2 (None, 8, 8, 128) 0
- _________________________________________________________________
- dropout_2 (Dropout) (None, 8, 8, 128) 0
- _________________________________________________________________
- conv3_1 (Conv2D) (None, 8, 8, 256) 295168
- _________________________________________________________________
- conv3_2 (Conv2D) (None, 8, 8, 256) 590080
- _________________________________________________________________
- conv3_3 (Conv2D) (None, 8, 8, 256) 590080
- _________________________________________________________________
- batch_normalization_3 (Batch (None, 8, 8, 256) 1024
- _________________________________________________________________
- max_pooling2d_3 (MaxPooling2 (None, 4, 4, 256) 0
- _________________________________________________________________
- dropout_3 (Dropout) (None, 4, 4, 256) 0
- _________________________________________________________________
- conv4_1 (Conv2D) (None, 4, 4, 512) 1180160
- _________________________________________________________________
- conv4_2 (Conv2D) (None, 4, 4, 512) 2359808
- _________________________________________________________________
- conv4_3 (Conv2D) (None, 4, 4, 512) 2359808
- _________________________________________________________________
- batch_normalization_4 (Batch (None, 4, 4, 512) 2048
- _________________________________________________________________
- max_pooling2d_4 (MaxPooling2 (None, 2, 2, 512) 0
- _________________________________________________________________
- dropout_4 (Dropout) (None, 2, 2, 512) 0
- _________________________________________________________________
- conv5_1 (Conv2D) (None, 2, 2, 512) 2359808
- _________________________________________________________________
- conv5_2 (Conv2D) (None, 2, 2, 512) 2359808
- _________________________________________________________________
- conv5_3 (Conv2D) (None, 2, 2, 512) 2359808
- _________________________________________________________________
- batch_normalization_5 (Batch (None, 2, 2, 512) 2048
- _________________________________________________________________
- max_pooling2d_5 (MaxPooling2 (None, 1, 1, 512) 0
- _________________________________________________________________
- dropout_5 (Dropout) (None, 1, 1, 512) 0
- _________________________________________________________________
- flatten_1 (Flatten) (None, 512) 0
- _________________________________________________________________
- dense_1 (Dense) (None, 4096) 2101248
- _________________________________________________________________
- activation_1 (Activation) (None, 4096) 0
- _________________________________________________________________
- dropout_6 (Dropout) (None, 4096) 0
- _________________________________________________________________
- dense_2 (Dense) (None, 10) 40970
- _________________________________________________________________
- activation_2 (Activation) (None, 10) 0
- =================================================================

经max pooling(最大化池化),滤波器为2x2,步长为2,图像尺寸减半,池化后的尺寸变为112x112x64
max pooling池化,尺寸变为56x56x128
max pooling池化,尺寸变为28x28x256
max pooling池化,尺寸变为14x14x512
max pooling池化,尺寸变为7x7x512
- class _VGG16_(nn.Module):
- def __init__(self):
- super(_VGG16_, self).__init__()
- self.conv1_1 = nn.Conv2d(3, 64, 3)
- self.conv1_2 = nn.Conv2d(64, 64, 3, stride=1, padding=1) # 假设输入图像的尺寸为7*224*224
- self.max_pooling_1 = nn.MaxPool2d(2, stride=2, padding=1) # 112 * 64 * 64
- self.conv2_1 = nn.Conv2d(64, 128, 3)
- self.conv2_2 = nn.Conv2d(128, 128, 3, stride=1, padding=1)
- self.max_pooling_2 = nn.MaxPool2d(2, stride=2, padding=1) # 56 * 128 * 128
- self.conv3_1 = nn.Conv2d(128, 256, 3)
- self.conv3_2 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
- self.conv3_3 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
- self.max_pooling_3 = nn.MaxPool2d(2, stride=2, padding=1) # 28 * 256 * 256
- self.conv4_1 = nn.Conv2d(256, 512, 3)
- self.conv4_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.conv4_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.max_pooling_4 = nn.MaxPool2d(2, stride=2, padding=1) # 14 * 512 * 512
- self.conv5_1 = nn.Conv2d(512, 512, 3)
- self.conv5_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.conv5_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
- self.max_pooling_5 = nn.MaxPool2d(2, stride=2, padding=1) # 7 * 512 * 512
- self.fc1 = nn.Linear(7 * 7 * 512, 4096)
- self.fc2 = nn.Linear(4096, 4096)
- self.fc3 = nn.Linear(4096, 10)
- def forward(self, x):
- x = self.conv1_1(x)
- x = F.relu(x)
- x = self.conv1_2(x)
- x = F.relu(x)
- x = self.max_pooling_1(x)
- x = self.conv2_1(x)
- x = F.relu(x)
- x = self.conv2_2(x)
- x = F.relu(x)
- x = self.max_pooling_2(x)
- x = self.conv3_1(x)
- x = F.relu(x)
- x = self.conv3_2(x)
- x = F.relu(x)
- x = self.conv3_3(x)
- x = F.relu(x)
- x = self.max_pooling_3(x)
- x = self.conv4_1(x)
- x = F.relu(x)
- x = self.conv4_2(x)
- x = F.relu(x)
- x = self.conv4_3(x)
- x = F.relu(x)
- x = self.max_pooling_4(x)
- x = self.conv5_1(x)
- x = F.relu(x)
- x = self.conv5_2(x)
- x = F.relu(x)
- x = self.conv5_3(x)
- x = F.relu(x)
- x = self.max_pooling_5(x)
- x = x.view(-1, 7 * 7 * 512)
- x = self.fc1(x)
- x = F.relu(x)
- x = self.fc2(x)
- x = F.relu(x)
- x = self.fc3(x)
- x = F.softmax(x)
- return x

- import tensorflow as tf
- from tensorflow.keras.datasets import cifar10
- from tensorflow.keras.applications import VGG16
- from tensorflow.keras.layers import Dense, Flatten
- from tensorflow.keras.models import Model
- from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
- # 下载并加载CIFAR-10数据集
- (x_train, y_train), (x_test, y_test) = cifar10.load_data()
- # 对数据进行预处理,将像素值缩放到0到1之间
- x_train = x_train / 255.0
- x_test = x_test / 255.0
- # 构建VGG16模型
- base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
- # 冻结VGG16的权重
- for layer in base_model.layers:
- layer.trainable = False
- # 在VGG16之上添加自定义的全连接层
- x = Flatten()(base_model.output)
- x = Dense(512, activation='relu')(x)
- x = Dense(10, activation='softmax')(x)
- # 创建新的模型
- model = Model(base_model.input, x)
- # 编译模型
- model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
- # 设置回调函数
- checkpoint = ModelCheckpoint('vgg16_model.h5', save_best_only=True, save_weights_only=False, monitor='val_accuracy', mode='max')
- tensorboard = TensorBoard(log_dir='./logs', histogram_freq=1)
- # 训练模型
- history = model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test), callbacks=[checkpoint, tensorboard])
- # 打印训练结果
- print("训练集准确率:", history.history['accuracy'][-1])
- print("验证集准确率:", history.history['val_accuracy'][-1])
- # 保存模型
- model.save('vgg16_model.h5')

- import tensorflow as tf
- import numpy as np
- from tensorflow.keras.datasets import cifar10
- from tensorflow.keras.applications.vgg16 import preprocess_input
- from tensorflow.keras.models import load_model
- from PIL import Image
- # 加载CIFAR-10数据集的类别标签
- class_labels = [
- '狗', '蛙', '马', '船', '卡车'
- '飞机', '汽车', '鸟', '猫', '鹿',
- ]
- # 加载训练好的VGG16模型
- model = load_model('vgg16_model.h5')
- # 加载待分类的图像
- image_path = 'path' # 替换为你自己的图像路径
- image = Image.open(image_path)
- image = image.resize((32, 32)) # 将图像调整为与训练数据相同的尺寸
- image = np.array(image)
- image = preprocess_input(image) # 预处理图像数据
- # 执行图像分类
- predictions = model.predict(np.expand_dims(image, axis=0))
- predicted_class_index = np.argmax(predictions)
- predicted_class_label = class_labels[predicted_class_index]
- # 输出预测结果
- print("预测标签:", predicted_class_label)

参考:使用pytorch构建基于VGG16的网络实现Cifar10分类_vgg16 pytorch代码_shrinco的博客-CSDN博客
