赞
踩
迁移学习包括获取在一个问题上学到的特征,并将它们用于一个新的类似问题。例如,已经学会识别浣熊的模型的特征可能有助于启动旨在识别狸猫的模型。
迁移学习通常用于您的数据集数据太少而无法从头开始训练全尺寸模型的任务。
在深度学习的背景下,迁移学习最常见的体现是以下工作流程:
从先前训练的模型中提取层。
冻结它们,以避免在未来的训练回合中破坏它们包含的任何信息。
在冻结层之上添加一些新的可训练层。他们将学习将旧特征转化为对新数据集的预测。
在数据集上训练新层。
最后一个可选步骤是微调,它包括解冻您在上面获得的整个模型(或其中的一部分),并以非常低的学习率在新数据上对其进行重新训练。通过逐步使预训练特征适应新数据,这可能会实现有意义的改进。
首先,我们将详细介绍 Keras trainableAPI,它是大多数迁移学习和微调工作流程的基础。
然后,我们将通过采用在 ImageNet 数据集上预训练的模型,并在 Kaggle“猫与狗”分类数据集上对其进行重新训练来演示典型的工作流程。
Layers & models 具有三个权重属性:
import numpy as np
import tensorflow as tf
from tensorflow import keras
layer = keras.layers.Dense(3)
layer.build((None, 4)) # Create the weights
print("weights:", len(layer.weights))
print("trainable_weights:", len(layer.trainable_weights))
print("non_trainable_weights:", len(layer.non_trainable_weights))
weights: 2
trainable_weights: 2
non_trainable_weights: 0
一般来说,所有的权重都是可训练的权重。唯一具有不可训练权重的内置层是BatchNormalization层。它使用不可训练的权重来跟踪训练期间输入的均值和方差。要了解如何在您自己的自定义层中使用不可训练的权重,请参阅 从头开始编写新层的指南。
layer = keras.layers.BatchNormalization()
layer.build((None, 4)) # Create the weights
print("weights:", len(layer.weights))
print("trainable_weights:", len(layer.trainable_weights))
print("non_trainable_weights:", len(layer.non_trainable_weights))
weights: 4
trainable_weights: 2
non_trainable_weights: 2
图层和模型还具有布尔属性trainable。它的值可以改变。设置layer.trainable为False将所有层的权重从可训练移动到不可训练。这称为“冻结”层:冻结层的状态在训练期间不会更新(无论是在训练时还是在fit()训练依赖于trainable_weights应用梯度更新的任何自定义循环时)。
layer = keras.layers.Dense(3)
layer.build((None, 4)) # Create the weights
layer.trainable = False # Freeze the layer
print("weights:", len(layer.weights))
print("trainable_weights:", len(layer.trainable_weights))
print("non_trainable_weights:", len(layer.non_trainable_weights))
weights: 2
trainable_weights: 0
non_trainable_weights: 2
当可训练权重变为不可训练时,其值在训练期间不再更新。
# Make a model with 2 layers layer1 = keras.layers.Dense(3, activation="relu") layer2 = keras.layers.Dense(3, activation="sigmoid") model = keras.Sequential([keras.Input(shape=(3,)), layer1, layer2]) # Freeze the first layer layer1.trainable = False # Keep a copy of the weights of layer1 for later reference initial_layer1_weights_values = layer1.get_weights() # Train the model model.compile(optimizer="adam", loss="mse") model.fit(np.random.random((2, 3)), np.random.random((2, 3))) # Check that the weights of layer1 have not changed during training final_layer1_weights_values = layer1.get_weights() np.testing.assert_allclose( initial_layer1_weights_values[0], final_layer1_weights_values[0] ) np.testing.assert_allclose( initial_layer1_weights_values[1], final_layer1_weights_values[1] )
1/1 [==============================] - 0s 333ms/step - loss: 0.1007
不要将属性与参数混淆( layer.trainable它控制层是否应在推理模式或训练模式下运行其前向传递)。有关详细信息,请参阅 Keras 常见问题解答。traininglayer.call()
如果您trainable = False在模型或任何具有子层的层上进行设置,则所有子层也将变得不可训练。
例子:
inner_model = keras.Sequential( [ keras.Input(shape=(3,)), keras.layers.Dense(3, activation="relu"), keras.layers.Dense(3, activation="relu"), ] ) model = keras.Sequential( [keras.Input(shape=(3,)), inner_model, keras.layers.Dense(3, activation="sigmoid"),] ) model.trainable = False # Freeze the outer model assert inner_model.trainable == False # All layers in `model` are now frozen assert inner_model.layers[0].trainable == False # `trainable` is propagated recursively
这引导我们了解如何在 Keras 中实现典型的迁移学习工作流程:
请注意,另一种更轻量级的工作流程也可以是:
第二个工作流程的一个关键优势是您只需对数据运行一次基本模型,而不是每个训练周期运行一次。所以它更快更便宜。
但是,第二个工作流程的一个问题是它不允许您在训练期间动态修改新模型的输入数据,例如,在进行数据扩充时需要这样做。当新数据集的数据太少而无法从头开始训练全尺寸模型时,迁移学习通常用于任务,在这种情况下,数据增强非常重要。所以接下来,我们将重点关注第一个工作流程。
这是 Keras 中的第一个工作流程:
首先,实例化一个具有预训练权重的基础模型。
base_model = keras.applications.Xception(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(150, 150, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
然后,冻结基础模型。
base_model.trainable = False
在上面创建一个新模型。
inputs = keras.Input(shape=(150, 150, 3))
# We make sure that the base_model is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
x = base_model(inputs, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
# A Dense classifier with a single unit (binary classification)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
在新数据上训练模型。
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()])
model.fit(new_dataset, epochs=20, callbacks=..., validation_data=...)
一旦您的模型收敛于新数据,您可以尝试解冻全部或部分基础模型,并以非常低的学习率端到端地重新训练整个模型。
这是一个可选的最后一步,可能会给您带来增量改进。它还可能导致快速过度拟合——请记住这一点。
至关重要的是,只有在具有冻结层的模型训练到收敛后才执行此步骤。如果将随机初始化的可训练层与包含预训练特征的可训练层混合使用,则随机初始化的层将在训练期间导致非常大的梯度更新,这将破坏您的预训练特征。
在此阶段使用非常低的学习率也很重要,因为您在通常非常小的数据集上训练比第一轮训练大得多的模型。因此,如果您应用较大的权重更新,您将面临很快过度拟合的风险。在这里,您只想以增量方式重新调整预训练的权重。
这是如何实现整个基础模型的微调:
# Unfreeze the base model
base_model.trainable = True
# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are take into account
model.compile(optimizer=keras.optimizers.Adam(1e-5), # Very low learning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()])
# Train end-to-end. Be careful to stop before you overfit!
model.fit(new_dataset, epochs=10, callbacks=..., validation_data=...)
compile()重要说明trainable
调用compile()模型意味着“冻结”该模型的行为。这意味着trainable 模型编译时的属性值应该在该模型的整个生命周期中保留,直到compile再次调用。因此,如果您更改任何trainable值,请确保compile()再次调用您的模型以使您的更改被考虑在内。
BatchNormalization关于图层的重要说明
许多图像模型包含BatchNormalization图层。该层是每个可以想象的特例。请记住以下几点。
您将在本指南末尾的端到端示例中看到这种模式的实际应用。
fit()如果您使用自己的低级训练循环而不是,则工作流程基本保持不变。在应用梯度更新时,你应该小心只考虑列表 model.trainable_weights:
# Create base model base_model = keras.applications.Xception( weights='imagenet', input_shape=(150, 150, 3), include_top=False) # Freeze base model base_model.trainable = False # Create new model on top. inputs = keras.Input(shape=(150, 150, 3)) x = base_model(inputs, training=False) x = keras.layers.GlobalAveragePooling2D()(x) outputs = keras.layers.Dense(1)(x) model = keras.Model(inputs, outputs) loss_fn = keras.losses.BinaryCrossentropy(from_logits=True) optimizer = keras.optimizers.Adam() # Iterate over the batches of a dataset. for inputs, targets in new_dataset: # Open a GradientTape. with tf.GradientTape() as tape: # Forward pass. predictions = model(inputs) # Compute the loss value for this batch. loss_value = loss_fn(targets, predictions) # Get gradients of loss wrt the *trainable* weights. gradients = tape.gradient(loss_value, model.trainable_weights) # Update the weights of the model. optimizer.apply_gradients(zip(gradients, model.trainable_weights))
同样用于微调。
为了巩固这些概念,让我们带您完成一个具体的端到端迁移学习和微调示例。我们将加载在 ImageNet 上预训练的 Xception 模型,并将其用于 Kaggle“猫与狗”分类数据集。
首先,让我们使用 TFDS 获取猫狗数据集。如果您有自己的数据集,您可能希望使用该实用程序 tf.keras.utils.image_dataset_from_directory从磁盘上的一组图像生成类似的标记数据集对象,这些图像被归档到特定于类的文件夹中。
迁移学习在处理非常小的数据集时最有用。为了保持我们的数据集较小,我们将使用 40% 的原始训练数据(25,000 张图像)进行训练,10% 用于验证,10% 用于测试。
import tensorflow_datasets as tfds tfds.disable_progress_bar() train_ds, validation_ds, test_ds = tfds.load( "cats_vs_dogs", # Reserve 10% for validation and 10% for test split=["train[:40%]", "train[40%:50%]", "train[50%:60%]"], as_supervised=True, # Include labels ) print("Number of training samples: %d" % tf.data.experimental.cardinality(train_ds)) print( "Number of validation samples: %d" % tf.data.experimental.cardinality(validation_ds) ) print("Number of test samples: %d" % tf.data.experimental.cardinality(test_ds))
Number of training samples: 9305
Number of validation samples: 2326
Number of test samples: 2326
这些是训练数据集中的前 9 张图像——如您所见,它们的大小各不相同。
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for i, (image, label) in enumerate(train_ds.take(9)):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(image)
plt.title(int(label))
plt.axis("off")
我们还可以看到标签 1 是“狗”,标签 0 是“猫”。
我们的原始图像有多种尺寸。此外,每个像素由 0 到 255(RGB 级别值)之间的 3 个整数值组成。这不太适合为神经网络提供数据。我们需要做两件事:
一般来说,开发将原始数据作为输入的模型是一种很好的做法,而不是采用已经预处理过的数据的模型。原因是,如果您的模型需要预处理数据,则任何时候您导出模型以在其他地方使用它(在网络浏览器中,在移动应用程序中),您都需要重新实现完全相同的预处理管道。这很快就会变得非常棘手。所以我们应该在打模型之前做尽可能少的预处理。
在这里,我们将在数据管道中调整图像大小(因为深度神经网络只能处理连续批次的数据),并且我们将在创建模型时将输入值缩放作为模型的一部分。
让我们将图像调整为 150x150:
size = (150, 150)
train_ds = train_ds.map(lambda x, y: (tf.image.resize(x, size), y))
validation_ds = validation_ds.map(lambda x, y: (tf.image.resize(x, size), y))
test_ds = test_ds.map(lambda x, y: (tf.image.resize(x, size), y))
此外,让我们对数据进行批处理并使用缓存和预取来优化加载速度。
batch_size = 32
train_ds = train_ds.cache().batch(batch_size).prefetch(buffer_size=10)
validation_ds = validation_ds.cache().batch(batch_size).prefetch(buffer_size=10)
test_ds = test_ds.cache().batch(batch_size).prefetch(buffer_size=10)
当您没有大型图像数据集时,通过对训练图像应用随机但逼真的变换(例如随机水平翻转或小的随机旋转)来人为引入样本多样性是一种很好的做法。这有助于将模型暴露于训练数据的不同方面,同时减缓过度拟合。
from tensorflow import keras
from tensorflow.keras import layers
data_augmentation = keras.Sequential(
[layers.RandomFlip("horizontal"), layers.RandomRotation(0.1),]
)
让我们想象一下第一批的第一张图像在经过各种随机变换后的样子:
import numpy as np
for images, labels in train_ds.take(1):
plt.figure(figsize=(10, 10))
first_image = images[0]
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
augmented_image = data_augmentation(
tf.expand_dims(first_image, 0), training=True
)
plt.imshow(augmented_image[0].numpy().astype("int32"))
plt.title(int(labels[0]))
plt.axis("off")
现在让我们按照我们之前解释的蓝图构建一个模型。
注意:
base_model = keras.applications.Xception( weights="imagenet", # Load weights pre-trained on ImageNet. input_shape=(150, 150, 3), include_top=False, ) # Do not include the ImageNet classifier at the top. # Freeze the base_model base_model.trainable = False # Create new model on top inputs = keras.Input(shape=(150, 150, 3)) x = data_augmentation(inputs) # Apply random data augmentation # Pre-trained Xception weights requires that input be scaled # from (0, 255) to a range of (-1., +1.), the rescaling layer # outputs: `(inputs * scale) + offset` scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1) x = scale_layer(x) # The base model contains batchnorm layers. We want to keep them in inference mode # when we unfreeze the base model for fine-tuning, so we make sure that the # base_model is running in inference mode here. x = base_model(x, training=False) x = keras.layers.GlobalAveragePooling2D()(x) x = keras.layers.Dropout(0.2)(x) # Regularize with dropout outputs = keras.layers.Dense(1)(x) model = keras.Model(inputs, outputs) model.summary()
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_5 (InputLayer) [(None, 150, 150, 3)] 0 _________________________________________________________________ sequential_3 (Sequential) (None, 150, 150, 3) 0 _________________________________________________________________ rescaling (Rescaling) (None, 150, 150, 3) 0 _________________________________________________________________ xception (Functional) (None, 5, 5, 2048) 20861480 _________________________________________________________________ global_average_pooling2d (Gl (None, 2048) 0 _________________________________________________________________ dropout (Dropout) (None, 2048) 0 _________________________________________________________________ dense_7 (Dense) (None, 1) 2049 ================================================================= Total params: 20,863,529 Trainable params: 2,049 Non-trainable params: 20,861,480 _________________________________________________________________
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()],
)
epochs = 20
model.fit(train_ds, epochs=epochs, validation_data=validation_ds)
Epoch 1/20 291/291 [==============================] - 133s 451ms/step - loss: 0.1670 - binary_accuracy: 0.9267 - val_loss: 0.0830 - val_binary_accuracy: 0.9716 Epoch 2/20 291/291 [==============================] - 135s 465ms/step - loss: 0.1208 - binary_accuracy: 0.9502 - val_loss: 0.0768 - val_binary_accuracy: 0.9716 Epoch 3/20 291/291 [==============================] - 135s 463ms/step - loss: 0.1062 - binary_accuracy: 0.9572 - val_loss: 0.0757 - val_binary_accuracy: 0.9716 Epoch 4/20 291/291 [==============================] - 137s 469ms/step - loss: 0.1024 - binary_accuracy: 0.9554 - val_loss: 0.0733 - val_binary_accuracy: 0.9725 Epoch 5/20 291/291 [==============================] - 137s 470ms/step - loss: 0.1004 - binary_accuracy: 0.9587 - val_loss: 0.0735 - val_binary_accuracy: 0.9729 Epoch 6/20 291/291 [==============================] - 136s 467ms/step - loss: 0.0979 - binary_accuracy: 0.9577 - val_loss: 0.0747 - val_binary_accuracy: 0.9708 Epoch 7/20 291/291 [==============================] - 134s 462ms/step - loss: 0.0998 - binary_accuracy: 0.9596 - val_loss: 0.0706 - val_binary_accuracy: 0.9725 Epoch 8/20 291/291 [==============================] - 133s 457ms/step - loss: 0.1029 - binary_accuracy: 0.9592 - val_loss: 0.0720 - val_binary_accuracy: 0.9733 Epoch 9/20 291/291 [==============================] - 135s 466ms/step - loss: 0.0937 - binary_accuracy: 0.9625 - val_loss: 0.0707 - val_binary_accuracy: 0.9721 Epoch 10/20 291/291 [==============================] - 137s 472ms/step - loss: 0.0967 - binary_accuracy: 0.9580 - val_loss: 0.0720 - val_binary_accuracy: 0.9712 Epoch 11/20 291/291 [==============================] - 135s 463ms/step - loss: 0.0961 - binary_accuracy: 0.9612 - val_loss: 0.0802 - val_binary_accuracy: 0.9699 Epoch 12/20 291/291 [==============================] - 134s 460ms/step - loss: 0.0963 - binary_accuracy: 0.9638 - val_loss: 0.0721 - val_binary_accuracy: 0.9716 Epoch 13/20 291/291 [==============================] - 136s 468ms/step - loss: 0.0925 - binary_accuracy: 0.9635 - val_loss: 0.0736 - val_binary_accuracy: 0.9686 Epoch 14/20 291/291 [==============================] - 138s 476ms/step - loss: 0.0909 - binary_accuracy: 0.9624 - val_loss: 0.0766 - val_binary_accuracy: 0.9703 Epoch 15/20 291/291 [==============================] - 136s 467ms/step - loss: 0.0949 - binary_accuracy: 0.9598 - val_loss: 0.0704 - val_binary_accuracy: 0.9725 Epoch 16/20 291/291 [==============================] - 133s 456ms/step - loss: 0.0969 - binary_accuracy: 0.9586 - val_loss: 0.0722 - val_binary_accuracy: 0.9708 Epoch 17/20 291/291 [==============================] - 135s 464ms/step - loss: 0.0913 - binary_accuracy: 0.9635 - val_loss: 0.0718 - val_binary_accuracy: 0.9716 Epoch 18/20 291/291 [==============================] - 137s 472ms/step - loss: 0.0915 - binary_accuracy: 0.9639 - val_loss: 0.0727 - val_binary_accuracy: 0.9725 Epoch 19/20 291/291 [==============================] - 134s 460ms/step - loss: 0.0938 - binary_accuracy: 0.9631 - val_loss: 0.0707 - val_binary_accuracy: 0.9733 Epoch 20/20 291/291 [==============================] - 134s 460ms/step - loss: 0.0971 - binary_accuracy: 0.9609 - val_loss: 0.0714 - val_binary_accuracy: 0.9716 <keras.callbacks.History at 0x7f4494e38f70>
最后,让我们解冻基础模型并以低学习率端到端地训练整个模型。
重要的是,虽然基础模型变得可训练,但它仍然以推理模式运行,因为我们training=False在构建模型时调用它时通过了。这意味着内部的批量归一化层不会更新它们的批量统计信息。如果他们这样做了,他们将破坏模型到目前为止所学习的表征。
# Unfreeze the base_model. Note that it keeps running in inference mode # since we passed `training=False` when calling it. This means that # the batchnorm layers will not update their batch statistics. # This prevents the batchnorm layers from undoing all the training # we've done so far. base_model.trainable = True model.summary() model.compile( optimizer=keras.optimizers.Adam(1e-5), # Low learning rate loss=keras.losses.BinaryCrossentropy(from_logits=True), metrics=[keras.metrics.BinaryAccuracy()], ) epochs = 10 model.fit(train_ds, epochs=epochs, validation_data=validation_ds)
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_5 (InputLayer) [(None, 150, 150, 3)] 0 _________________________________________________________________ sequential_3 (Sequential) (None, 150, 150, 3) 0 _________________________________________________________________ rescaling (Rescaling) (None, 150, 150, 3) 0 _________________________________________________________________ xception (Functional) (None, 5, 5, 2048) 20861480 _________________________________________________________________ global_average_pooling2d (Gl (None, 2048) 0 _________________________________________________________________ dropout (Dropout) (None, 2048) 0 _________________________________________________________________ dense_7 (Dense) (None, 1) 2049 ================================================================= Total params: 20,863,529 Trainable params: 20,809,001 Non-trainable params: 54,528 _________________________________________________________________ Epoch 1/10 291/291 [==============================] - 567s 2s/step - loss: 0.0749 - binary_accuracy: 0.9689 - val_loss: 0.0605 - val_binary_accuracy: 0.9776 Epoch 2/10 291/291 [==============================] - 551s 2s/step - loss: 0.0559 - binary_accuracy: 0.9770 - val_loss: 0.0507 - val_binary_accuracy: 0.9798 Epoch 3/10 291/291 [==============================] - 545s 2s/step - loss: 0.0444 - binary_accuracy: 0.9832 - val_loss: 0.0502 - val_binary_accuracy: 0.9807 Epoch 4/10 291/291 [==============================] - 558s 2s/step - loss: 0.0365 - binary_accuracy: 0.9874 - val_loss: 0.0506 - val_binary_accuracy: 0.9807 Epoch 5/10 291/291 [==============================] - 550s 2s/step - loss: 0.0276 - binary_accuracy: 0.9890 - val_loss: 0.0477 - val_binary_accuracy: 0.9802 Epoch 6/10 291/291 [==============================] - 588s 2s/step - loss: 0.0206 - binary_accuracy: 0.9916 - val_loss: 0.0444 - val_binary_accuracy: 0.9832 Epoch 7/10 291/291 [==============================] - 542s 2s/step - loss: 0.0206 - binary_accuracy: 0.9923 - val_loss: 0.0502 - val_binary_accuracy: 0.9828 Epoch 8/10 291/291 [==============================] - 544s 2s/step - loss: 0.0153 - binary_accuracy: 0.9939 - val_loss: 0.0509 - val_binary_accuracy: 0.9819 Epoch 9/10 291/291 [==============================] - 548s 2s/step - loss: 0.0156 - binary_accuracy: 0.9934 - val_loss: 0.0610 - val_binary_accuracy: 0.9807 Epoch 10/10 291/291 [==============================] - 546s 2s/step - loss: 0.0176 - binary_accuracy: 0.9936 - val_loss: 0.0561 - val_binary_accuracy: 0.9789 <keras.callbacks.History at 0x7f4495056040>
在 10 个 epoch 之后,微调让我们在这里有了很好的改进。
https://keras.io/guides/transfer_learning/
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。