赞
踩
大家好,我是微学AI,今天给大家介绍一下人工智能算法工程师(高级)课程5-图像生成项目之对抗生成模型与代码详解。本文将介绍对抗生成模型(GAN)及其变体CGAN、DCGAN的数学原理,并通过PyTorch框架搭建完整可运行的代码,帮助读者掌握图像生成的原理和技术。
生成对抗网络(GAN)由Goodfellow等人在2014年提出,主要由生成器(Generator)和判别器(Discriminator)两部分组成。生成器的任务是生成尽可能接近真实数据的样本,而判别器的任务是将生成器生成的样本与真实样本区分开来。
GAN的目标函数如下:
V
(
D
,
G
)
=
E
x
∼
p
d
a
t
a
(
x
)
[
log
D
(
x
)
]
+
E
z
∼
p
z
(
z
)
[
log
(
1
−
D
(
G
(
z
)
)
)
]
V(D, G) = \mathbb{E}_{x \sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]
V(D,G)=Ex∼pdata(x)[logD(x)]+Ez∼pz(z)[log(1−D(G(z)))]
其中,
D
(
x
)
D(x)
D(x)表示判别器判断
x
x
x为真实样本的概率,
G
(
z
)
G(z)
G(z)表示生成器生成的样本,
z
z
z为随机噪声。
首先,导入所需库:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
定义生成器:
class Generator(nn.Module):
def __init__(self, z_dim, img_dim):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(z_dim, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, img_dim),
nn.Tanh()
)
def forward(self, z):
return self.model(z)
定义判别器:
class Discriminator(nn.Module): def __init__(self, img_dim): super(Discriminator, self).__init__() self.model = nn.Sequential( nn.Linear(img_dim, 1024), nn.LeakyReLU(0.2), nn.Dropout(0.3), nn.Linear(1024, 512), nn.LeakyReLU(0.2), nn.Dropout(0.3), nn.Linear(512, 256), nn.LeakyReLU(0.2), nn.Dropout(0.3), nn.Linear(256, 1), nn.Sigmoid() ) def forward(self, x): return self.model(x)
训练GAN模型:
# 设定超参数 z_dim = 100 img_dim = 28 * 28 batch_size = 64 lr = 0.0002 epochs = 50 # 加载数据集 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform) dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True) # 初始化生成器和判别器 G = Generator(z_dim, img_dim) D = Discriminator(img_dim) # 定义优化器 optimizer_G = optim.Adam(G.parameters(), lr=lr) optimizer_D = optim.Adam(D.parameters(), lr=lr) # 训练过程 for epoch in range(epochs): for i, (imgs, _) in enumerate(dataloader): # 训练判别器 real_imgs = imgs.view(-1, img_dim) z = torch.randn(batch_size, z_dim) fake_imgs = G(z) D_real = D(real_imgs) D_fake = D(fake_imgs) D_loss = -torch.mean(torch.log(D_real) + torch.log(1 - D_fake)) optimizer_D.zero_grad() D_loss.backward() optimizer_D.step() # 训练生成器 z = torch.randn(batch_size, z_dim) fake_imgs = G(z) D_fake = D(fake_imgs) G_loss = -torch.mean(torch.log(D_fake)) optimizer_G.zero_grad() G_loss.backward() optimizer_G.step() if (i + 1) % 100 == 0: print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(dataloader)}], D_loss: {D_loss.item():.4f}, G_loss: {G_loss.item():.4f}')
条件生成对抗网络(CGAN)在GAN的基础上,为生成器和判别器引入了额外的条件信息 y y y。CGAN的目标函数如下:
V
(
D
,
G
)
=
E
x
∼
p
d
a
t
a
(
x
)
[
log
D
(
x
∣
y
)
]
+
E
z
∼
p
z
(
z
)
[
log
(
1
−
D
(
G
(
z
∣
y
)
)
)
]
V(D, G) = \mathbb{E}_{x \sim p_{data}(x)}[\log D(x|y)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z|y)))]
V(D,G)=Ex∼pdata(x)[logD(x∣y)]+Ez∼pz(z)[log(1−D(G(z∣y)))]
其中,
D
(
x
∣
y
)
D(x|y)
D(x∣y)表示在给定条件
y
y
y的情况下,判别器判断
x
x
x为真实样本的概率,
G
(
z
∣
y
)
G(z|y)
G(z∣y)表示在给定条件
y
y
y的情况下,生成器生成的样本。
CGAN的生成器和判别器需要在原有的基础上加入条件信息 y y y。以下是CGAN的代码实现:
class CGenerator(nn.Module): def __init__(self, z_dim, condition_dim, img_dim): super(CGenerator, self).__init__() self.model = nn.Sequential( nn.Linear(z_dim + condition_dim, 256), nn.LeakyReLU(0.2), nn.Linear(256, 512), nn.LeakyReLU(0.2), nn.Linear(512, 1024), nn.LeakyReLU(0.2), nn.Linear(1024, img_dim), nn.Tanh() ) def forward(self, z, y): z_y = torch.cat([z, y], 1) return self.model(z_y) class CDiscriminator(nn.Module): def __init__(self, img_dim, condition_dim): super(CDiscriminator, self).__init__() self.model = nn.Sequential( nn.Linear(img_dim + condition_dim, 1024), nn.LeakyReLU(0.2), nn.Dropout(0.3), nn.Linear(1024, 512), nn.LeakyReLU(0.2), nn.Dropout(0.3), nn.Linear(512, 256), nn.LeakyReLU(0.2), nn.Dropout(0.3), nn.Linear(256, 1), nn.Sigmoid() ) def forward(self, x, y): x_y = torch.cat([x, y], 1) return self.model(x_y)
训练CGAN模型时,需要将条件信息 y y y传递给生成器和判别器:
# 假设条件信息y的维度为10 condition_dim = 10 # 初始化条件生成器和条件判别器 CG = CGenerator(z_dim, condition_dim, img_dim) CD = CDiscriminator(img_dim, condition_dim) # 定义优化器 optimizer.CG = optim.Adam(CG.parameters(), lr=lr) optimizer.CD = optim.Adam(CD.parameters(), lr=lr) # 训练过程 for epoch in range(epochs): for i, (imgs, labels) in enumerate(dataloader): # 将标签转换为one-hot编码 y = torch.nn.functional.one_hot(labels, num_classes=condition_dim).float() real_imgs = imgs.view(-1, img_dim) z = torch.randn(batch_size, z_dim) fake_imgs = CG(z, y) CD_real = CD(real_imgs, y) CD_fake = CD(fake_imgs, y) CD_loss = -torch.mean(torch.log(CD_real) + torch.log(1 - CD_fake)) optimizer.CD.zero_grad() CD_loss.backward() optimizer.CD.step() z = torch.randn(batch_size, z_dim) fake_imgs = CG(z, y) CD_fake = CD(fake_imgs, y) CG_loss = -torch.mean(torch.log(CD_fake)) optimizer.CG.zero_grad() CG_loss.backward() optimizer.CG.step() if (i + 1) % 100 == 0: print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(dataloader)}], CD_loss: {CD_loss.item():.4f}, CG_loss: {CG_loss.item():.4f}')
深度卷积生成对抗网络(DCGAN)是GAN的一种变体,它将卷积神经网络(CNN)应用于生成器和判别器。DCGAN的目标函数与GAN相同,但其网络结构有所不同,使得生成器和判别器能够更好地处理图像数据。
以下是DCGAN的生成器和判别器的代码实现:
class DCGenerator(nn.Module): def __init__(self, z_dim, img_channels): super(DCGenerator, self).__init__() self.model = nn.Sequential( nn.ConvTranspose2d(z_dim, 256, 4, 1, 0, bias=False), nn.BatchNorm2d(256), nn.ReLU(True), nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False), nn.BatchNorm2d(128), nn.ReLU(True), nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False), nn.BatchNorm2d(64), nn.ReLU(True), nn.ConvTranspose2d(64, img_channels, 4, 2, 1, bias=False), nn.Tanh() ) def forward(self, z): z = z.view(z.size(0), z.size(1), 1, 1) return self.model(z) class DCDiscriminator(nn.Module): def __init__(self, img_channels): super(DCDiscriminator, self).__init__() self.model = nn.Sequential( nn.Conv2d(img_channels, 64, 4, 2, 1, bias=False), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(64, 128, 4, 2, 1, bias=False), nn.BatchNorm2d(128), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(128, 256, 4, 2, 1, bias=False), nn.BatchNorm2d(256), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(256, 1, 4, 1, 0, bias=False), nn.Sigmoid() ) def forward(self, img): return self.model(img).view(img.size(0), -1) # 设定超参数 z_dim = 100 img_channels = 1 img_size = 64 batch_size = 64 lr = 0.0002 epochs = 50 # 加载数据集 transform = transforms.Compose([ transforms.Resize(img_size), transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ]) train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform) dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True) # 初始化生成器和判别器 DG = DCGenerator(z_dim, img_channels) DD = DCDiscriminator(img_channels) # 定义优化器 optimizer_DG = optim.Adam(DG.parameters(), lr=lr) optimizer_DD = optim.Adam(DD.parameters(), lr=lr) # 训练过程 for epoch in range(epochs): for i, (imgs, _) in enumerate(dataloader): # 训练判别器 real_imgs = imgs.view(-1, img_channels, img_size, img_size) z = torch.randn(batch_size, z_dim, 1, 1) fake_imgs = DG(z) DD_real = DD(real_imgs) DD_fake = DD(fake_imgs) DD_loss = -torch.mean(torch.log(DD_real) + torch.log(1 - DD_fake)) optimizer_DD.zero_grad() DD_loss.backward() optimizer_DD.step() # 训练生成器 z = torch.randn(batch_size, z_dim, 1, 1) fake_imgs = DG(z) DD_fake = DD(fake_imgs) DG_loss = -torch.mean(torch.log(DD_fake)) optimizer_DG.zero_grad() DG_loss.backward() optimizer_DG.step() if (i + 1) % 100 == 0: print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(dataloader)}], DD_loss: {DD_loss.item():.4f}, DG_loss: {DG_loss.item():.4f}')
在上述代码中,DCGenerator使用了多个ConvTranspose2d
层来逐步增加图像的尺寸,而DCDiscriminator则使用了多个Conv2d
层来逐步减小图像的尺寸。每个卷积层后面都跟有批量归一化(BatchNorm)和激活函数(ReLU或LeakyReLU),这些是DCGAN的关键组成部分,有助于稳定训练过程。
本文介绍了GAN、CGAN和DCGAN的数学原理,并通过PyTorch框架提供了完整的代码实现。通过这些代码,读者可以深入了解对抗生成模型的训练过程,并掌握图像生成的原理和技术。在实际应用中,这些模型可以用于图像合成、风格迁移、数据增强等多个领域。需要注意的是,训练GAN模型可能会遇到稳定性问题,因此在实际操作中可能需要调整超参数和模型结构以获得最佳效果。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。