赞
踩
![[Pasted image 20240408160914.png]]
卷积层
最大池化
激活函数
图像分类:最初,VGG-16 是设计用于图像分类任务的。它在 ImageNet 数据集上进行了训练,并且在该任务上表现出色。因此,它可以应用于各种需要对图像进行分类的场景,如物体识别、动物识别、植物分类等。
特征提取:VGG-16 的卷积部分可以用作特征提取器。通过移除全连接层,并仅使用卷积和池化层,可以将其作为预训练模型应用于其他计算机视觉任务,例如目标检测、图像分割等。在这种情况下,卷积部分的特征提取能力是其主要价值所在。
图像风格转换:VGG-16 在图像风格转换任务中也有应用。通过将其作为特征提取器,并将输入图像和目标风格图像的特征表示之间的差异最小化,可以生成具有目标风格的图像。
图像检索:VGG-16 的特征表示可以用于图像检索任务。通过将图像转换为其在 VGG-16 中的特征向量,并计算这些特征向量之间的相似性,可以实现图像检索,找到与给定查询图像相似的图像。
医学影像分析:在医学影像分析领域,VGG-16 可以用于诊断支持、疾病检测和分割等任务。例如,它可以用于识别 X 射线图像中的疾病迹象,或者对病理切片图像进行细胞分类。
import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout from tensorflow.keras.models import Model def VGG_16(input_shape=(224, 224, 3), num_classes=1000): # 输入层 inputs = Input(shape=input_shape) # 第一组卷积层(包括两个卷积层和一个最大池化层) x = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs) x = Conv2D(64, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), strides=(2, 2))(x) # 第二组卷积层 x = Conv2D(128, (3, 3), activation='relu', padding='same')(x) x = Conv2D(128, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), strides=(2, 2))(x) # 第三组卷积层 x = Conv2D(256, (3, 3), activation='relu', padding='same')(x) x = Conv2D(256, (3, 3), activation='relu', padding='same')(x) x = Conv2D(256, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), strides=(2, 2))(x) # 第四组卷积层 x = Conv2D(512, (3, 3), activation='relu', padding='same')(x) x = Conv2D(512, (3, 3), activation='relu', padding='same')(x) x = Conv2D(512, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), strides=(2, 2))(x) # 第五组卷积层 x = Conv2D(512, (3, 3), activation='relu', padding='same')(x) x = Conv2D(512, (3, 3), activation='relu', padding='same')(x) x = Conv2D(512, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D((2, 2), strides=(2, 2))(x) # 全连接层 x = Flatten()(x) x = Dense(4096, activation='relu')(x) x = Dropout(0.5)(x) x = Dense(4096, activation='relu')(x) x = Dropout(0.5)(x) outputs = Dense(num_classes, activation='softmax')(x) # 创建模型 model = Model(inputs, outputs, name='VGG-16') return model # 创建 VGG-16 模型 vgg16_model = VGG_16() # 打印模型摘要 vgg16_model.summary()
输出结果
Model: "VGG-16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 224, 224, 3)] 0 conv2d (Conv2D) (None, 224, 224, 64) 1792 conv2d_1 (Conv2D) (None, 224, 224, 64) 36928 max_pooling2d (MaxPooling2 (None, 112, 112, 64) 0 D) conv2d_2 (Conv2D) (None, 112, 112, 128) 73856 conv2d_3 (Conv2D) (None, 112, 112, 128) 147584 max_pooling2d_1 (MaxPoolin (None, 56, 56, 128) 0 g2D) conv2d_4 (Conv2D) (None, 56, 56, 256) 295168 conv2d_5 (Conv2D) (None, 56, 56, 256) 590080 conv2d_6 (Conv2D) (None, 56, 56, 256) 590080 max_pooling2d_2 (MaxPoolin (None, 28, 28, 256) 0 g2D) conv2d_7 (Conv2D) (None, 28, 28, 512) 1180160 conv2d_8 (Conv2D) (None, 28, 28, 512) 2359808 conv2d_9 (Conv2D) (None, 28, 28, 512) 2359808 max_pooling2d_3 (MaxPoolin (None, 14, 14, 512) 0 g2D) conv2d_10 (Conv2D) (None, 14, 14, 512) 2359808 conv2d_11 (Conv2D) (None, 14, 14, 512) 2359808 conv2d_12 (Conv2D) (None, 14, 14, 512) 2359808 max_pooling2d_4 (MaxPoolin (None, 7, 7, 512) 0 g2D) flatten (Flatten) (None, 25088) 0 dense (Dense) (None, 4096) 102764544 dropout (Dropout) (None, 4096) 0 dense_1 (Dense) (None, 4096) 16781312 dropout_1 (Dropout) (None, 4096) 0 dense_2 (Dense) (None, 1000) 4097000 ================================================================= Total params: 138357544 (527.79 MB) Trainable params: 138357544 (527.79 MB) Non-trainable params: 0 (0.00 Byte) _________________________________________________________________ Process finished with exit code 0
![[Pasted image 20240408162627.png]]
梯度消失和梯度爆炸:在非残差网络中,随着网络层数的增加,反向传播过程中容易出现梯度消失或梯度爆炸的问题。这可能导致训练过程变得困难,甚至导致模型无法收敛。
难以训练深层网络:由于梯度消失和梯度爆炸问题,非残差网络在训练深层网络时会遇到困难。更深的网络需要更多的层来提取抽象特征,但非残差网络的性能在深层网络中可能会下降,因为梯度传播的效果变得更加不稳定。
网络性能饱和:随着网络层数的增加,非残差网络的性能可能会饱和或下降,而不是继续提高。这意味着增加网络深度不能简单地提高性能,而可能需要采取其他方法来改进网络结构。
特征传递限制:在非残差网络中,特征必须逐层传递,没有直接的路径可以将低层特征直接传递到高层。这可能会导致信息的丢失或模糊化,使得网络难以捕捉不同层次的语义信息。
参数浪费:非残差网络中,由于特征无法直接传递,每一层都需要学习如何从前一层的输出中提取有用的特征。这可能导致参数的冗余和浪费,因为一些层可能会学习到与前一层类似的特征,从而增加了训练的复杂性和计算成本。
梯度消失是深度神经网络训练过程中的一种常见问题,主要原因可以归结为以下几点:
激活函数的饱和性:深度神经网络通常使用的激活函数(如Sigmoid、Tanh)在输入值较大或较小时会饱和,导致梯度接近于零。在反向传播过程中,这些饱和的激活函数会使得梯度在传播过程中逐渐变小,最终消失。
网络结构和参数初始化:深度神经网络的结构和参数初始化也可能导致梯度消失。如果网络的权重初始化不合适,那么在反向传播过程中,梯度可能会在层与层之间逐渐减小,导致梯度消失。
链式求导的乘积效应:在深度网络中,梯度是通过链式求导来计算的。由于每个节点都涉及到梯度的乘积,这可能会导致乘积效应,使得梯度在传播过程中逐渐减小,最终消失。
不适当的优化器和学习率:使用不适当的优化器或学习率可能导致梯度消失。例如,如果学习率设置过大,可能会导致梯度爆炸;而如果学习率设置过小,可能会导致梯度消失。
长期依赖性问题:在循环神经网络(RNN)等网络中,长期依赖性问题也可能导致梯度消失。当网络需要学习长时间序列的依赖关系时,梯度在反向传播过程中可能会逐渐消失,导致网络难以捕捉到长期的依赖关系。
import tensorflow as tf from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, Add, MaxPooling2D, AveragePooling2D, \ Flatten, Dense from tensorflow.keras.models import Model def residual_block(x, filters, kernel_size=(3, 3), strides=(1, 1), activation='relu'): y = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='same')(x) y = BatchNormalization()(y) y = Activation(activation)(y) y = Conv2D(filters, kernel_size=kernel_size, strides=(1, 1), padding='same')(y) y = BatchNormalization()(y) if strides != (1, 1) or x.shape[-1] != filters: x = Conv2D(filters, kernel_size=(1, 1), strides=strides, padding='same')(x) out = Add()([x, y]) out = Activation(activation)(out) return out def ResNet(input_shape=(224, 224, 3), num_classes=1000): inputs = Input(shape=input_shape) # Conv1 x = Conv2D(64, kernel_size=(7, 7), strides=(2, 2), padding='same')(inputs) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x) # Conv2_x x = residual_block(x, filters=64) x = residual_block(x, filters=64) # Conv3_x x = residual_block(x, filters=128, strides=(2, 2)) x = residual_block(x, filters=128) # Conv4_x x = residual_block(x, filters=256, strides=(2, 2)) x = residual_block(x, filters=256) # Conv5_x x = residual_block(x, filters=512, strides=(2, 2)) x = residual_block(x, filters=512) x = AveragePooling2D(pool_size=(7, 7))(x) x = Flatten()(x) outputs = Dense(num_classes, activation='softmax')(x) model = Model(inputs, outputs, name='ResNet') return model # Create ResNet model resnet_model = ResNet() # Print model summary resnet_model.summary()
Model: "ResNet" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 224, 224, 3)] 0 [] conv2d (Conv2D) (None, 112, 112, 64) 9472 ['input_1[0][0]'] batch_normalization (Batch (None, 112, 112, 64) 256 ['conv2d[0][0]'] Normalization) activation (Activation) (None, 112, 112, 64) 0 ['batch_normalization[0][0]'] max_pooling2d (MaxPooling2 (None, 56, 56, 64) 0 ['activation[0][0]'] D) conv2d_1 (Conv2D) (None, 56, 56, 64) 36928 ['max_pooling2d[0][0]'] batch_normalization_1 (Bat (None, 56, 56, 64) 256 ['conv2d_1[0][0]'] chNormalization) activation_1 (Activation) (None, 56, 56, 64) 0 ['batch_normalization_1[0][0]' ] conv2d_2 (Conv2D) (None, 56, 56, 64) 36928 ['activation_1[0][0]'] batch_normalization_2 (Bat (None, 56, 56, 64) 256 ['conv2d_2[0][0]'] chNormalization) add (Add) (None, 56, 56, 64) 0 ['max_pooling2d[0][0]', 'batch_normalization_2[0][0]' ] activation_2 (Activation) (None, 56, 56, 64) 0 ['add[0][0]'] conv2d_3 (Conv2D) (None, 56, 56, 64) 36928 ['activation_2[0][0]'] batch_normalization_3 (Bat (None, 56, 56, 64) 256 ['conv2d_3[0][0]'] chNormalization) activation_3 (Activation) (None, 56, 56, 64) 0 ['batch_normalization_3[0][0]' ] conv2d_4 (Conv2D) (None, 56, 56, 64) 36928 ['activation_3[0][0]'] batch_normalization_4 (Bat (None, 56, 56, 64) 256 ['conv2d_4[0][0]'] chNormalization) add_1 (Add) (None, 56, 56, 64) 0 ['activation_2[0][0]', 'batch_normalization_4[0][0]' ] activation_4 (Activation) (None, 56, 56, 64) 0 ['add_1[0][0]'] conv2d_5 (Conv2D) (None, 28, 28, 128) 73856 ['activation_4[0][0]'] batch_normalization_5 (Bat (None, 28, 28, 128) 512 ['conv2d_5[0][0]'] chNormalization) activation_5 (Activation) (None, 28, 28, 128) 0 ['batch_normalization_5[0][0]' ] conv2d_6 (Conv2D) (None, 28, 28, 128) 147584 ['activation_5[0][0]'] conv2d_7 (Conv2D) (None, 28, 28, 128) 8320 ['activation_4[0][0]'] batch_normalization_6 (Bat (None, 28, 28, 128) 512 ['conv2d_6[0][0]'] chNormalization) add_2 (Add) (None, 28, 28, 128) 0 ['conv2d_7[0][0]', 'batch_normalization_6[0][0]' ] activation_6 (Activation) (None, 28, 28, 128) 0 ['add_2[0][0]'] conv2d_8 (Conv2D) (None, 28, 28, 128) 147584 ['activation_6[0][0]'] batch_normalization_7 (Bat (None, 28, 28, 128) 512 ['conv2d_8[0][0]'] chNormalization) activation_7 (Activation) (None, 28, 28, 128) 0 ['batch_normalization_7[0][0]' ] conv2d_9 (Conv2D) (None, 28, 28, 128) 147584 ['activation_7[0][0]'] batch_normalization_8 (Bat (None, 28, 28, 128) 512 ['conv2d_9[0][0]'] chNormalization) add_3 (Add) (None, 28, 28, 128) 0 ['activation_6[0][0]', 'batch_normalization_8[0][0]' ] activation_8 (Activation) (None, 28, 28, 128) 0 ['add_3[0][0]'] conv2d_10 (Conv2D) (None, 14, 14, 256) 295168 ['activation_8[0][0]'] batch_normalization_9 (Bat (None, 14, 14, 256) 1024 ['conv2d_10[0][0]'] chNormalization) activation_9 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_9[0][0]' ] conv2d_11 (Conv2D) (None, 14, 14, 256) 590080 ['activation_9[0][0]'] conv2d_12 (Conv2D) (None, 14, 14, 256) 33024 ['activation_8[0][0]'] batch_normalization_10 (Ba (None, 14, 14, 256) 1024 ['conv2d_11[0][0]'] tchNormalization) add_4 (Add) (None, 14, 14, 256) 0 ['conv2d_12[0][0]', 'batch_normalization_10[0][0] '] activation_10 (Activation) (None, 14, 14, 256) 0 ['add_4[0][0]'] conv2d_13 (Conv2D) (None, 14, 14, 256) 590080 ['activation_10[0][0]'] batch_normalization_11 (Ba (None, 14, 14, 256) 1024 ['conv2d_13[0][0]'] tchNormalization) activation_11 (Activation) (None, 14, 14, 256) 0 ['batch_normalization_11[0][0] '] conv2d_14 (Conv2D) (None, 14, 14, 256) 590080 ['activation_11[0][0]'] batch_normalization_12 (Ba (None, 14, 14, 256) 1024 ['conv2d_14[0][0]'] tchNormalization) add_5 (Add) (None, 14, 14, 256) 0 ['activation_10[0][0]', 'batch_normalization_12[0][0] '] activation_12 (Activation) (None, 14, 14, 256) 0 ['add_5[0][0]'] conv2d_15 (Conv2D) (None, 7, 7, 512) 1180160 ['activation_12[0][0]'] batch_normalization_13 (Ba (None, 7, 7, 512) 2048 ['conv2d_15[0][0]'] tchNormalization) activation_13 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_13[0][0] '] conv2d_16 (Conv2D) (None, 7, 7, 512) 2359808 ['activation_13[0][0]'] conv2d_17 (Conv2D) (None, 7, 7, 512) 131584 ['activation_12[0][0]'] batch_normalization_14 (Ba (None, 7, 7, 512) 2048 ['conv2d_16[0][0]'] tchNormalization) add_6 (Add) (None, 7, 7, 512) 0 ['conv2d_17[0][0]', 'batch_normalization_14[0][0] '] activation_14 (Activation) (None, 7, 7, 512) 0 ['add_6[0][0]'] conv2d_18 (Conv2D) (None, 7, 7, 512) 2359808 ['activation_14[0][0]'] batch_normalization_15 (Ba (None, 7, 7, 512) 2048 ['conv2d_18[0][0]'] tchNormalization) activation_15 (Activation) (None, 7, 7, 512) 0 ['batch_normalization_15[0][0] '] conv2d_19 (Conv2D) (None, 7, 7, 512) 2359808 ['activation_15[0][0]'] batch_normalization_16 (Ba (None, 7, 7, 512) 2048 ['conv2d_19[0][0]'] tchNormalization) add_7 (Add) (None, 7, 7, 512) 0 ['activation_14[0][0]', 'batch_normalization_16[0][0] '] activation_16 (Activation) (None, 7, 7, 512) 0 ['add_7[0][0]'] average_pooling2d (Average (None, 1, 1, 512) 0 ['activation_16[0][0]'] Pooling2D) flatten (Flatten) (None, 512) 0 ['average_pooling2d[0][0]'] dense (Dense) (None, 1000) 513000 ['flatten[0][0]'] ================================================================================================== Total params: 11700328 (44.63 MB) Trainable params: 11692520 (44.60 MB) Non-trainable params: 7808 (30.50 KB) __________________________________________________________________________________________________ Process finished with exit code 0
MNIST数据集是一个常用的手写数字识别数据集,它包含了来自250个不同人手写的0到9的数字图片。每张图片都是灰度图像,尺寸为28x28像素。
原始的 MNIST 数据库一共包含下面 4 个文件
![[Pasted image 20240408171823.png]]
在Python中,可以使用TensorFlow或者其他深度学习框架来加载MNIST数据集。以下是使用TensorFlow加载MNIST数据集的示例代码
import tensorflow as tf
# 加载MNIST数据集
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 对数据进行归一化
x_train, x_test = x_train / 255.0, x_test / 255.0
# 输出训练集和测试集的大小
print("训练集大小:", x_train.shape, y_train.shape)
print("测试集大小:", x_test.shape, y_test.shape)
import tensorflow as tf
# 加载Fashion-MNIST数据集
fashion_mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
# 对数据进行归一化
x_train, x_test = x_train / 255.0, x_test / 255.0
# 输出训练集和测试集的大小
print("训练集大小:", x_train.shape, y_train.shape)
print("测试集大小:", x_test.shape, y_test.shape)
import tensorflow as tf
# 加载CIFAR-10数据集
cifar10 = tf.keras.datasets.cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# 对数据进行归一化
x_train, x_test = x_train / 255.0, x_test / 255.0
# 输出训练集和测试集的大小
print("训练集大小:", x_train.shape, y_train.shape)
print("测试集大小:", x_test.shape, y_test.shape)
import tensorflow as tf
import tensorflow_datasets as tfds
# 加载PASCAL VOC数据集
dataset, info = tfds.load('voc/2007', split='train', with_info=True)
# 打印数据集信息
print(info)
# 显示数据集中的前几个示例
for example in dataset.take(5):
image, label = example["image"], example["objects"]["label"]
print("Image shape:", image.shape)
print("Labels:", label)
PASCAL的全称是Microsoft Common Objects in Context,起源 于微软于2014年出资标注的Microsoft COCO数据集
数据集以scene understanding为目标,主要从复杂的日常场景中 截取
包含目标分类(识别)、检测、分割、语义标注等数据集
ImageNet竞赛停办后,COCO竞赛就成为是当前目标识别、检 测等领域的一个最权威、最重要的标杆
官网:http://cocodataset.org
提供的标注类别有80 类,有超过33 万张图片,其中20 万张有 标注,整个数据集中个体的数目超过150 万个。
ImageNet数据集
ISLVRC 2012子数据集
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。