赞
踩
机器学习中最早接触的模型往往都是判别式模型(Discriminative Models),判别式模型用于分类或识别。判别式模型的定义如下:
A discriminative model is a statistical model that determines boundaries in observed data and uses these boundaries to make decisions or predictions.
回归模型,分类模型都属于判别式模型。
本文介绍的GANs和Diffusion模型都属于另一种被称为生成式模型的模型。生成式模型的定义如下:
A generative model describes how a dataset is generated in terms of a probabilistic model, samplig from the model allows us to generate new data that did not exist before.
生成式模型会根据已有的采样数据产生新的采样,因此生成模型的采样数据往往没有标签(或者目标),于是生成式模型的学习技术往往是无监督学习(Unsupervised Learning)。生成式模型产生的新的采样是之前采样集合中没有的,但形式上会类似于原有的采样。
例如下图中,我们通过一系列表情,使用生成式模型,产生了新的表情。
相比于判别式模型,生成式模型是概率性的(probabilistic),而不是确定性的(deterministic)。
本文关注于两种生成式模型:
GANs
DDPMs
定义:"A machine learning model in which two neural networks compete with one another to become more accurate in their predictions - the two networks play a zero-sum game"。
GANs的应用:
生成器(generator)
是GANs的两个神经网络之一,用于根据已有采样,产生逼真的(realistic)的新采样。
鉴别器(Discriminator)
是GANs的两个神经网络之一,主要任务是鉴别哪些是由生成器产生的伪数据。
使用GANs的时候,常常遇到以下问题
目前所有这些问题都没有得到完全地解决,减轻这些问题也是目前研究的技术方向之一。
梯度消失(Vanishing Gradients)
如果鉴别器的鉴别工作做的非常好,鉴别器往往难以提供有效的信息给生成器,用于改进生成器的权重等参数,这将导致梯度消失。
梯度消失问题可以通过使用改进的损失函数减轻:
模式崩溃(mode collapse)
理想情况下,GANs应该会对所有的随机性输入产生多样化的伪数据。但是,生成器可能只学习到一种合理的输入,因而总是产生相同类型的伪数据。在训练过程中,如果生成器意识到某种类型的数据会被经常产生,则生成器可能会过度优化(over-optimize)鉴别器,导致只能产生很小数量的类型接近的数据。这个失败被称为模式崩溃。
在模式崩溃中,生成器会因为只能产生真实数据中的某类数据而停止。研究者使用不同的技术减轻这个问题:
收敛失败(failure to converge)
收敛失败是GANs一个主要的失败原因。随着训练的进行,鉴别器可能在某个点上已经很难鉴别真伪了,于是会给出相当于完全随机的鉴别结果。这种结果不仅无法改进生成器,反而会使生成器的质量下降,导致结果无法收敛。
研究者使用不同的技术减轻这个问题:
Keras的Model类:
tf.keras.losses.BinaryCrossentropy()
这是一个函数类,该函数用于计算真实标签和预测标签之间的交叉熵( cross-entropy)损失。详见参考手册:Probabilistic losses
ones_like(),zeros_like()
创建一个和输入数据形状相同的全1/全0的张量(Tensor, 指标量,向量,矩阵,...)。详见参考手册:https://www.tensorflow.org/api_docs/python/tf/ones_like
GradientsTape,自动微分(automatic differentiation)
参考TensorFlow手册中的这段:https://www.tensorflow.org/guide/autodiff
apply_gradients()
参考TensorFlow手册中的这段:https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer
简单来说,这个函数执行后,会将每一个元组(gradient, variable)中的variable减去gradient,从而产生新的variable的值。但是,这是在假设learning_rate=1,clipvalue=0的情况下。也就是说,减去的值是可以由一些参数调整和控制的。参考下面这段程序
- opt = tf.keras.optimizers.experimental.SGD(learning_rate=1, clipvalue=0)
- var1, var2 = tf.Variable(2.0), tf.Variable(2.0)
- with tf.GradientTape() as tape:
- loss = 2 * var1 + 2 * var2
- grads = tape.gradient(loss, [var1, var2])
- print([grads[0].numpy(), grads[1].numpy()])
-
-
- opt.apply_gradients(zip(grads, [var1, var2]))
- # Without clipping, we should get [0, 0], but as gradients are clipped
- # to have max value 1, we get [1.0, 1.0].
- print([var1.numpy(), var2.numpy()])
在上面这段程序中,一开始var1=var2=2.0,loss函数的定义是:loss = 2 * var1 + 2 * var2;于是loss对var1和var2的导数(微分)分别是2, 2,也就是执行了求导动作“tape.gradient(loss, [var1, var2])”之后,打印出来的导数结果是[2.0, 2.0]。接下来,调用apply_gradients()之后,var1 = var1 - grads[0] = 2.0 - 2.0 = 0,var2 = var2 - grads[1] = 2.0 - 2.0 = 0。这个计算是基于learning_rate=1, clipvalue=0的结果。如果learning_rate = 1, clipvalue = 1,则结果是[1.0, 1.0],即梯度被减少了1。如果learning_rate = 0.1, clipvalue = 0,则结果是[1.8, 1.8],即梯度被乘上了0.1。
大家可以思考一下,如果learning_rate = 0.1, clipvalue = 1,结果应该是多少?
[1.9, 1.9]
reduce_mean()
参考TensorFlow手册中的这段:https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean
该函数的功能是对一个多维数组降维且取平均。当第二个参数axis不存在时,会对数组所有元素取平均,最终返回一个标量的平均值。如果axis存在,则对这个axis取平均,结果会降一维。
例如下面的程序
- x = tf.constant([[1., 1.], [2., 8.]])
- y = tf.reduce_mean(x)
- print(y)
- y = tf.reduce_mean(x, 0)
- print(y)
- y = tf.reduce_mean(x, 1)
- print(y)
输出为:
- tf.Tensor(3.0, shape=(), dtype=float32)
- tf.Tensor([1.5 4.5], shape=(2,), dtype=float32)
- tf.Tensor([1. 5.], shape=(2,), dtype=float32)
tqdm
是一个辅助python程序显示进度条的小工具。可以试一下手册上的这个小程序,就明白其作用了。
- from tqdm import tqdm
- for i in tqdm(range(int(9e6))):
- pass
fashion MNIST数据集是一个拥有70,000张(程序中导入的数据集只有60000张)28x28像素的带标签的流行图片集,包含衣服,裤子,运动鞋等等。参考官方链接:Fashion MNIST | Kaggle
下面程序,以Dense层为基础,对fashion MNIST数据集建立了GANs模型,用于产生16张新的类似fashion MNIST数据集的图片。
导入库
- import os
- import time
- import numpy as np
- import tensorflow as tf
- import matplotlib.pyplot as plt
-
- from tqdm import tqdm
- from keras import layers, Sequential
- from keras.layers import Dense, ReLU, Reshape, Input, Flatten, LeakyReLU, Dropout
- from keras.datasets import fashion_mnist
加载数据
- (train_images, _),(_, _) = fashion_mnist.load_data()
- print(train_images.shape)
-
- # reshape to height, width, color channel (as gray, only 1 channel)
- train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
- print(train_images.shape)
中心化且定标
- # center & scaling the value of images data
- # max value is 255, so that 127.5 is got from '255/2'
- train_images = (train_images - 127.5) / 127.5
- print(train_images[56782, :10, :10])
查看其中一个图片
- #If image data is float type, the value range must be in (0, 1).
- #However, in this example, the range is (-1, 1), it seems it still works
- plt.imshow(train_images[2567].squeeze(), cmap='gray')
显示如下:
构建TensorFlow数据集
- buffer_size = 600000
- batch_size = 128
-
- # build TensorFlow dataset, which will be used more conveniently
- train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(buffer_size).batch(batch_size)
- def generator_model():
- model = Sequential()
- #model.add(Dense(64, input_dim = 100))
- #model.add(ReLU())
- model.add(Dense(64, input_dim = 100, activation='relu'))
-
- #model.add(Dense(128))
- #model.add(ReLU())
- model.add(Dense(128, activation='relu'))
-
- #model.add(Dense(256))
- #model.add(ReLU())
- model.add(Dense(256, activation='relu'))
-
- #Output layer
- #As each centered & scaled value is from -1 to 1, we choose 'tanh'
- model.add(Dense(784, activation='tanh'))
- model.add(Reshape((28, 28, 1)))
-
- return model
-
- generator = generator_model()
- generator.summary()
“generator”就是我们想要的生成器(可以理解为一个函数对象)。可以看到,在构建过程中,我们使用了全连接网络层Dense。
此时,如果使用这个生成器基于噪声产生一个图片,会是这样的结果,程序如下
- #Given a specific noise, we use generator to generate a fake image
- noise = tf.random.normal([1, 100])
- generated_image = generator(noise, training = False)
- print(generated_image.shape)
-
- plt.imshow(generated_image[0, :, :, 0], cmap='gray')
得到的图片为
因为这个generator还没有得到任何训练,所以产生的就是纯噪声的图片。
- def discriminator_model():
- model = Sequential()
- model.add(Input(shape=(28, 28, 1)))
- model.add(Flatten())
-
- #LeakyReLU activation function is much better for training and convergence
- # when using GAN. 0.2 is the small gradient when below zero. This gradient can
- # mitigate the occurrence of saturated or dead neurons.
- #Dropout layer mitigates over-fit by turning-off a certain percentage of
- # neurons. The percentage is the input of the layer
- model.add(Dense(256))
- model.add(LeakyReLU(0.2))
- model.add(Dropout(0.5))
-
- model.add(Dense(128))
- model.add(LeakyReLU(0.2))
- model.add(Dropout(0.3))
-
- model.add(Dense(64))
- model.add(LeakyReLU(0.2))
- model.add(Dropout(0.2))
-
- #Output layer, output is a signal score
- model.add(Dense(1, activation='sigmoid'))
-
- return model
-
- discriminator = discriminator_model()
- discriminator.summary()
“discriminator”就是我们想要的鉴别器(可以理解为一个函数对象)。可以看到,在构建过程中,我们使用了全连接网络层Dense。
同样,我们使用前面通过纯噪声产生的图片“generated_image”,输入到这个没有经过训练的discriminator中
- output = discriminator(generated_image)
- print(output)
得到输出如下:
tf.Tensor([[0.464832]], shape=(1, 1), dtype=float32)
可以看到这个张量的值是0.46,非常接近0.5,也就是无法鉴别目标图片。因为没有经过训练,所以这个结果是合理的。
- #BinaryCrossentropy can heavily penalize the misclassification
- bce = tf.keras.losses.BinaryCrossentropy()
-
- #Real images using category one, while fake images using category zero
- #For discriminator, the target is what determined as real are really real
- # and what determined as fake are really fake.
- def discriminator_loss(real_output, fake_output):
- real_loss = bce(tf.ones_like(real_output), real_output)
- fake_loss = bce(tf.zeros_like(fake_output), fake_output)
- total_loss = real_loss + fake_loss
- return total_loss
-
- #For generator, it hopes discriminator determines the fake data as real
- def generator_loss(fake_output):
- gen_loss = bce(tf.ones_like(fake_output), fake_output)
- return gen_loss
同时,定义生成器和鉴别器的优化因子,用于梯度下降的算法。
- generator_optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001)
- discriminator_optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001)
学习率0.001,表示每次梯度下降的时候,只下降当前梯度值的千分之一。详见“语言要点”中的“apply_gradients()”。
下面这段代码用于创建checkpoint,其作用并未验证
- #If the training interrupted, it can be found in checkpoint
- checkpoint_dir = './'
- checkpoint_prefix = os.path.join(checkpoint_dir, 'chpt')
- checkpoint = tf.train.Checkpoint(generator_optimizer = generator_optimizer,
- discriminator_optimizer = discriminator_optimizer,
- generator = generator,
- discriminator = discriminator)
- epochs = 50
- noise_dim= 100
- num_examples_to_generate = 16
定义每一batch的训练过程
- @tf.function
- def train_step(images):
- noise = tf.random.normal([batch_size, noise_dim])
-
- #GradientTape is used to record operations for automatic differentiation(记录自动微分操作)
- with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
- generated_images = generator(noise, training = True)
-
- real_output = discriminator(images, training = True)
- fake_output = discriminator(generated_images, training = True)
-
- disc_loss = discriminator_loss(real_output, fake_output)
- gen_loss = generator_loss(fake_output)
-
- # 计算gen_loss对generator.trainable_variables的微分
- # All subclasses of tf.Module aggregate their variables in the Module.trainable_variables property
- gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
- gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
-
- # Apply new gradients to the model's optimizer
- generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
- discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
-
- return (gen_loss, disc_loss, tf.reduce_mean(real_output), tf.reduce_mean(fake_output))
在每一个epoch之后,根据输入,校验当前模型的训练结果,会通过图片显示出来
- def generate_and_plot_iamges(model, epoch, test_input):
- predictions = model(test_input, training = False)
- fig = plt.figure(figsize=(8, 4))
-
- for i in range(predictions.shape[0]):
- plt.subplot(4, 4, i+1)
- pred = (predictions[i, :, :, 0] + 1) * 127.5
- pred = np.array(pred)
- plt.imshow(pred.astype(np.uint8), cmap = 'gray')
- plt.axis('off')
-
- plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))
- plt.show()
下面是整个训练过程
- def train(dataset, epochs):
- gen_loss_list = []
- disc_loss_list = []
-
- real_score_list = []
- fake_score_list = []
- for epoch in tqdm(range(epochs)):
- start = time.time()
- num_batches = len(dataset)
- print(f'Traing started with epoch {epoch + 1} with {num_batches} batches...')
-
- total_gen_loss = 0
- total_disc_loss = 0
-
- for batch in dataset:
- generator_loss, discriminator_loss, real_score, fake_score = train_step(batch)
- total_gen_loss += generator_loss
- total_disc_loss += discriminator_loss
-
- mean_gen_loss = total_gen_loss / num_batches
- mean_disc_loss = total_disc_loss / num_batches
-
- print('Losses after epoch %5d: generator %.3f, discriminator %.3f, real_score %.2f%%, fake_score %.2f%%' %
- (epoch + 1, generator_loss, discriminator_loss, real_score * 100, fake_score * 100))
-
- #Use 16 noise images to validate the current model visually
- seed = tf.random.normal([num_examples_to_generate, noise_dim])
- generate_and_plot_iamges(generator, epoch + 1, seed)
-
- gen_loss_list.append(mean_gen_loss)
- disc_loss_list.append(mean_disc_loss)
- real_score_list.append(real_score)
- fake_score_list.append(fake_score)
-
- if (epoch + 1) % 10 == 0:
- checkpoint.save(file_prefix = checkpoint_prefix)
-
- print('Time for epoch {} is {} sec'.format(epoch+1, time.time()-start))
-
- return gen_loss_list, disc_loss_list, real_score_list, fake_score_list
运行模型的train()
- gen_loss_epochs, disc_loss_epochs, real_score_list, fake_score_list = train(train_dataset,
- epochs = epochs)
这个训练过程比较慢,建议将Google Colab的运行设置为GPU环境,训练过程会快不少。方法如下:点击Runtime,选择Change runtime type,然后选择GPU,Save;然后再运行程序。
通过训练过程中打印的图片,可以发现,产生的图片越来越接近原数据集中的图片。下面是第1期(epoch)和第50期训练后的输出:
- fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))
-
- ax1.plot(gen_loss_epochs, label = "Generator Loss", alpha = 0.5)
- ax1.plot(disc_loss_epochs, label = "Discriminator Loss", alpha = 0.5)
- ax1.set_title('Training Losses')
- ax1.legend()
-
- ax2.plot(real_score_list, label = "Real Score", alpha = 0.5)
- ax2.plot(fake_score_list, label = "Fake Score", alpha = 0.5)
- ax2.set_title('Accuracy Scores')
- ax2.legend()
结果如下:
可以看到,随着训练的进行,生成器的损失在减少,鉴别器的损失在增加,也就是产生的图片越来越接近真实图片。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。