三、使用 CRFAI-10数据集进行训练



  • conv3-64 :是指第三层卷积后维度变成64,同样地,conv3-128指的是第三层卷积后维度变成128;
  • FC-4096 :指的是全连接层中有4096个节点,同样地,FC-1000为该层全连接层有1000个节点;
  • maxpool :是指最大池化,在vgg16中,pooling采用的是2*2的最大池化方法
  • soft-max:全连接层




Top-1和Top-5 error 是深度学习中评价模型预测错误率的两个指标,在VGG论文中是这样解释这两个指标的:
The former is a multi-class classification error, i.e. the proportion of incorrectly classified images; the latter is the main evaluation criterion used in ILSVRC, and is computed as the proportion of images such that the ground-truth category is outside the top-5 predicted categories.

Top-1 error 的意思是:假如模型预测某张动物图片(一只猫)的类别,且模型只输出1个预测结果,那么这一个结果正好能猜出来这个动物是只猫的概率就是Top-1正确率。猜出来的结果不是猫的概率则成为Top-1错误率。简单来说就是模型猜错的概率。

Top-5 error 的意思是:假如模型预测某张动物图片(还是刚才那只猫),但模型会输出来5个预测结果,那么这五个结果中有猫这个分类的概率成为Top-5正确率,相反,预测输出的这五个结果里没有猫这个分类的概率则成为Top-5错误率。

一般来说,Top-1和Top-5错误率越低,模型的性能也就越好。且Top-5 error 在数值上会比Top-1 error 的数值要小,毕竟从1个结果猜对的几率总会比从5个结果里猜对的几率要小!

经过 速度跟精度发现VGG16和VGG19是最优化的层



VGG16是一种深度卷积神经网络模型,用于图像分类和识别任务。它是由牛津大学的研究团队开发的,命名为Visual Geometry Group(VGG),并在2014年的ImageNet图像识别挑战中取得了很好的成绩。





  1. Layer (type) Output Shape Param #
  2. =================================================================
  3. input_1 (InputLayer) (None, 32, 32, 3) 0
  4. _________________________________________________________________
  5. conv1_1 (Conv2D) (None, 32, 32, 64) 1792
  6. _________________________________________________________________
  7. conv1_2 (Conv2D) (None, 32, 32, 64) 36928
  8. _________________________________________________________________
  9. batch_normalization_1 (Batch (None, 32, 32, 64) 256
  10. _________________________________________________________________
  11. max_pooling2d_1 (MaxPooling2 (None, 16, 16, 64) 0
  12. _________________________________________________________________
  13. dropout_1 (Dropout) (None, 16, 16, 64) 0
  14. _________________________________________________________________
  15. conv2_1 (Conv2D) (None, 16, 16, 128) 73856
  16. _________________________________________________________________
  17. conv2_2 (Conv2D) (None, 16, 16, 128) 147584
  18. _________________________________________________________________
  19. batch_normalization_2 (Batch (None, 16, 16, 128) 512
  20. _________________________________________________________________
  21. max_pooling2d_2 (MaxPooling2 (None, 8, 8, 128) 0
  22. _________________________________________________________________
  23. dropout_2 (Dropout) (None, 8, 8, 128) 0
  24. _________________________________________________________________
  25. conv3_1 (Conv2D) (None, 8, 8, 256) 295168
  26. _________________________________________________________________
  27. conv3_2 (Conv2D) (None, 8, 8, 256) 590080
  28. _________________________________________________________________
  29. conv3_3 (Conv2D) (None, 8, 8, 256) 590080
  30. _________________________________________________________________
  31. batch_normalization_3 (Batch (None, 8, 8, 256) 1024
  32. _________________________________________________________________
  33. max_pooling2d_3 (MaxPooling2 (None, 4, 4, 256) 0
  34. _________________________________________________________________
  35. dropout_3 (Dropout) (None, 4, 4, 256) 0
  36. _________________________________________________________________
  37. conv4_1 (Conv2D) (None, 4, 4, 512) 1180160
  38. _________________________________________________________________
  39. conv4_2 (Conv2D) (None, 4, 4, 512) 2359808
  40. _________________________________________________________________
  41. conv4_3 (Conv2D) (None, 4, 4, 512) 2359808
  42. _________________________________________________________________
  43. batch_normalization_4 (Batch (None, 4, 4, 512) 2048
  44. _________________________________________________________________
  45. max_pooling2d_4 (MaxPooling2 (None, 2, 2, 512) 0
  46. _________________________________________________________________
  47. dropout_4 (Dropout) (None, 2, 2, 512) 0
  48. _________________________________________________________________
  49. conv5_1 (Conv2D) (None, 2, 2, 512) 2359808
  50. _________________________________________________________________
  51. conv5_2 (Conv2D) (None, 2, 2, 512) 2359808
  52. _________________________________________________________________
  53. conv5_3 (Conv2D) (None, 2, 2, 512) 2359808
  54. _________________________________________________________________
  55. batch_normalization_5 (Batch (None, 2, 2, 512) 2048
  56. _________________________________________________________________
  57. max_pooling2d_5 (MaxPooling2 (None, 1, 1, 512) 0
  58. _________________________________________________________________
  59. dropout_5 (Dropout) (None, 1, 1, 512) 0
  60. _________________________________________________________________
  61. flatten_1 (Flatten) (None, 512) 0
  62. _________________________________________________________________
  63. dense_1 (Dense) (None, 4096) 2101248
  64. _________________________________________________________________
  65. activation_1 (Activation) (None, 4096) 0
  66. _________________________________________________________________
  67. dropout_6 (Dropout) (None, 4096) 0
  68. _________________________________________________________________
  69. dense_2 (Dense) (None, 10) 40970
  70. _________________________________________________________________
  71. activation_2 (Activation) (None, 10) 0
  72. =================================================================




经max pooling(最大化池化),滤波器为2x2,步长为2,图像尺寸减半,池化后的尺寸变为112x112x64

max pooling池化,尺寸变为56x56x128

max pooling池化,尺寸变为28x28x256

max pooling池化,尺寸变为14x14x512

max pooling池化,尺寸变为7x7x512







三、使用 CRFAI-10数据集进行训练


  1. class _VGG16_(nn.Module):
  2. def __init__(self):
  3. super(_VGG16_, self).__init__()
  4. self.conv1_1 = nn.Conv2d(3, 64, 3)
  5. self.conv1_2 = nn.Conv2d(64, 64, 3, stride=1, padding=1) # 假设输入图像的尺寸为7*224*224
  6. self.max_pooling_1 = nn.MaxPool2d(2, stride=2, padding=1) # 112 * 64 * 64
  7. self.conv2_1 = nn.Conv2d(64, 128, 3)
  8. self.conv2_2 = nn.Conv2d(128, 128, 3, stride=1, padding=1)
  9. self.max_pooling_2 = nn.MaxPool2d(2, stride=2, padding=1) # 56 * 128 * 128
  10. self.conv3_1 = nn.Conv2d(128, 256, 3)
  11. self.conv3_2 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
  12. self.conv3_3 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
  13. self.max_pooling_3 = nn.MaxPool2d(2, stride=2, padding=1) # 28 * 256 * 256
  14. self.conv4_1 = nn.Conv2d(256, 512, 3)
  15. self.conv4_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
  16. self.conv4_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
  17. self.max_pooling_4 = nn.MaxPool2d(2, stride=2, padding=1) # 14 * 512 * 512
  18. self.conv5_1 = nn.Conv2d(512, 512, 3)
  19. self.conv5_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
  20. self.conv5_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
  21. self.max_pooling_5 = nn.MaxPool2d(2, stride=2, padding=1) # 7 * 512 * 512
  22. self.fc1 = nn.Linear(7 * 7 * 512, 4096)
  23. self.fc2 = nn.Linear(4096, 4096)
  24. self.fc3 = nn.Linear(4096, 10)
  25. def forward(self, x):
  26. x = self.conv1_1(x)
  27. x = F.relu(x)
  28. x = self.conv1_2(x)
  29. x = F.relu(x)
  30. x = self.max_pooling_1(x)
  31. x = self.conv2_1(x)
  32. x = F.relu(x)
  33. x = self.conv2_2(x)
  34. x = F.relu(x)
  35. x = self.max_pooling_2(x)
  36. x = self.conv3_1(x)
  37. x = F.relu(x)
  38. x = self.conv3_2(x)
  39. x = F.relu(x)
  40. x = self.conv3_3(x)
  41. x = F.relu(x)
  42. x = self.max_pooling_3(x)
  43. x = self.conv4_1(x)
  44. x = F.relu(x)
  45. x = self.conv4_2(x)
  46. x = F.relu(x)
  47. x = self.conv4_3(x)
  48. x = F.relu(x)
  49. x = self.max_pooling_4(x)
  50. x = self.conv5_1(x)
  51. x = F.relu(x)
  52. x = self.conv5_2(x)
  53. x = F.relu(x)
  54. x = self.conv5_3(x)
  55. x = F.relu(x)
  56. x = self.max_pooling_5(x)
  57. x = x.view(-1, 7 * 7 * 512)
  58. x = self.fc1(x)
  59. x = F.relu(x)
  60. x = self.fc2(x)
  61. x = F.relu(x)
  62. x = self.fc3(x)
  63. x = F.softmax(x)
  64. return x


  1. import tensorflow as tf
  2. from tensorflow.keras.datasets import cifar10
  3. from tensorflow.keras.applications import VGG16
  4. from tensorflow.keras.layers import Dense, Flatten
  5. from tensorflow.keras.models import Model
  6. from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
  7. # 下载并加载CIFAR-10数据集
  8. (x_train, y_train), (x_test, y_test) = cifar10.load_data()
  9. # 对数据进行预处理,将像素值缩放到0到1之间
  10. x_train = x_train / 255.0
  11. x_test = x_test / 255.0
  12. # 构建VGG16模型
  13. base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
  14. # 冻结VGG16的权重
  15. for layer in base_model.layers:
  16. layer.trainable = False
  17. # 在VGG16之上添加自定义的全连接层
  18. x = Flatten()(base_model.output)
  19. x = Dense(512, activation='relu')(x)
  20. x = Dense(10, activation='softmax')(x)
  21. # 创建新的模型
  22. model = Model(base_model.input, x)
  23. # 编译模型
  24. model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  25. # 设置回调函数
  26. checkpoint = ModelCheckpoint('vgg16_model.h5', save_best_only=True, save_weights_only=False, monitor='val_accuracy', mode='max')
  27. tensorboard = TensorBoard(log_dir='./logs', histogram_freq=1)
  28. # 训练模型
  29. history = model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test), callbacks=[checkpoint, tensorboard])
  30. # 打印训练结果
  31. print("训练集准确率:", history.history['accuracy'][-1])
  32. print("验证集准确率:", history.history['val_accuracy'][-1])
  33. # 保存模型
  34. model.save('vgg16_model.h5')




  1. import tensorflow as tf
  2. import numpy as np
  3. from tensorflow.keras.datasets import cifar10
  4. from tensorflow.keras.applications.vgg16 import preprocess_input
  5. from tensorflow.keras.models import load_model
  6. from PIL import Image
  7. # 加载CIFAR-10数据集的类别标签
  8. class_labels = [
  9. '狗', '蛙', '马', '船', '卡车'
  10. '飞机', '汽车', '鸟', '猫', '鹿',
  11. ]
  12. # 加载训练好的VGG16模型
  13. model = load_model('vgg16_model.h5')
  14. # 加载待分类的图像
  15. image_path = 'path' # 替换为你自己的图像路径
  16. image = Image.open(image_path)
  17. image = image.resize((32, 32)) # 将图像调整为与训练数据相同的尺寸
  18. image = np.array(image)
  19. image = preprocess_input(image) # 预处理图像数据
  20. # 执行图像分类
  21. predictions = model.predict(np.expand_dims(image, axis=0))
  22. predicted_class_index = np.argmax(predictions)
  23. predicted_class_label = class_labels[predicted_class_index]
  24. # 输出预测结果
  25. print("预测标签:", predicted_class_label)


