赞
踩
VGG网络采用重复堆叠的小卷积核替代大卷积核,在保证具有相同感受野的条件下,提升了网络的深度,从而提升网络特征提取的能力。
可以把VGG网络看成是数个vgg_block的堆叠,每个vgg_block由几个卷积层+ReLU层,最后加上一层池化层组成。VGG网络名称后面的数字表示整个网络中包含参数层的数量(卷积层或全连接层,不含池化层)。
这里展示了一张经典的VGG16的网络结构图,该系列其他网络结构图可以参照这里。
由网络结构图可以看出,VGG块的组成规律是:连续使用数个相同的填充为1、窗口形状为3×3的卷积层后接上一个步幅为2、窗口形状为2×2的最大池化层。卷积层保持输入的高和宽不变,而池化层则对宽高减半。因此先使用vgg_block
函数来实现这个基础的VGG块,它可以指定卷积层的数量和输入输出通道数:
def vgg_block(num_convs, in_channels, out_channels):
blk = []
for i in range(num_convs):
if i == 0:
blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
else:
blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
blk.append(nn.ReLU(inplace=True))
blk.append(nn.MaxPool2d(kernel_size=2, stride=2)) # 宽高减半
return nn.Sequential(*blk)
继续由网络结构图可以看出,VGG网络由5个VGG块连接而成,然后铺平连接到三个全连接层。对于每一个VGG块,其卷积层的数量是不同的,这也是VGG系列网络结构之间的差异所在。以图示VGG16为例,5个VGG块的卷积层数量分别为(2, 2, 3, 3, 3),再加上3个全连接层,总的参数层数量为16,因此命名为VGG16。
5个VGG块的输入输出通道数分别为:(3, 64), (64, 128), (128, 256), (256, 512), (512, 512)。
下面实现VGGNet
,其中block
表示VGG块,参数layers
是一个列表,指定了每一个VGG块包含卷积层的数量:
class VGGNet(nn.Module): def __init__(self, block, layers, num_classes=1000): super(VGGNet, self).__init__() self.vgg_block_1 = block(layers[0], 3, 64) self.vgg_block_2 = block(layers[1], 64, 128) self.vgg_block_3 = block(layers[2], 128, 256) self.vgg_block_4 = block(layers[3], 256, 512) self.vgg_block_5 = block(layers[4], 512, 512) self.fc = nn.Sequential( nn.Linear(512 * 7 * 7, 4096), nn.ReLU(True), nn.Dropout(0.5), nn.Linear(4096, 4096), nn.ReLU(True), nn.Dropout(0.5), nn.Linear(4096, num_classes) ) for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.Linear): nn.init.normal_(m.weight, 0, 0.01) nn.init.constant_(m.bias, 0) def forward(self, x): x = self.vgg_block_1(x) x = self.vgg_block_2(x) x = self.vgg_block_3(x) x = self.vgg_block_4(x) x = self.vgg_block_5(x) x = x.view(x.size(0), -1) return self.fc(x)
注意到,对__init__
中对权重进行了合适的初始化,否则网络很难训练。
针对VGG系列网络的差异(每个VGG块使用的卷积层数量不同),接下来分别定义VGG11、VGG13、VGG16、VGG19网络:
def vgg11(**kwargs):
model = VGGNet(vgg_block, [1, 1, 2, 2, 2], **kwargs)
return model
def vgg13(**kwargs):
model = VGGNet(vgg_block, [2, 2, 2, 2, 2], **kwargs)
return model
def vgg16(**kwargs):
model = VGGNet(vgg_block, [2, 2, 3, 3, 3], **kwargs)
return model
def vgg19(**kwargs):
model = VGGNet(vgg_block, [2, 2, 4, 4, 4], **kwargs)
return model
以VGG16为例,用torchsummary输出一下网络结构:
if __name__ == "__main__":
net = vgg16(num_classes=2)
from torchsummary import summary
net.cuda()
summary(net, (3, 224, 224))
---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 64, 224, 224] 1,792 ReLU-2 [-1, 64, 224, 224] 0 Conv2d-3 [-1, 64, 224, 224] 36,928 ReLU-4 [-1, 64, 224, 224] 0 MaxPool2d-5 [-1, 64, 112, 112] 0 Conv2d-6 [-1, 128, 112, 112] 73,856 ReLU-7 [-1, 128, 112, 112] 0 Conv2d-8 [-1, 128, 112, 112] 147,584 ReLU-9 [-1, 128, 112, 112] 0 MaxPool2d-10 [-1, 128, 56, 56] 0 Conv2d-11 [-1, 256, 56, 56] 295,168 ReLU-12 [-1, 256, 56, 56] 0 Conv2d-13 [-1, 256, 56, 56] 590,080 ReLU-14 [-1, 256, 56, 56] 0 Conv2d-15 [-1, 256, 56, 56] 590,080 ReLU-16 [-1, 256, 56, 56] 0 MaxPool2d-17 [-1, 256, 28, 28] 0 Conv2d-18 [-1, 512, 28, 28] 1,180,160 ReLU-19 [-1, 512, 28, 28] 0 Conv2d-20 [-1, 512, 28, 28] 2,359,808 ReLU-21 [-1, 512, 28, 28] 0 Conv2d-22 [-1, 512, 28, 28] 2,359,808 ReLU-23 [-1, 512, 28, 28] 0 MaxPool2d-24 [-1, 512, 14, 14] 0 Conv2d-25 [-1, 512, 14, 14] 2,359,808 ReLU-26 [-1, 512, 14, 14] 0 Conv2d-27 [-1, 512, 14, 14] 2,359,808 ReLU-28 [-1, 512, 14, 14] 0 Conv2d-29 [-1, 512, 14, 14] 2,359,808 ReLU-30 [-1, 512, 14, 14] 0 MaxPool2d-31 [-1, 512, 7, 7] 0 Linear-32 [-1, 4096] 102,764,544 ReLU-33 [-1, 4096] 0 Dropout-34 [-1, 4096] 0 Linear-35 [-1, 4096] 16,781,312 ReLU-36 [-1, 4096] 0 Dropout-37 [-1, 4096] 0 Linear-38 [-1, 2] 8,194 ================================================================ Total params: 134,268,738 Trainable params: 134,268,738 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.57 Forward/backward pass size (MB): 218.58 Params size (MB): 512.19 Estimated Total Size (MB): 731.35 ----------------------------------------------------------------
与网络结构图比较一致。
从头开始训练一个VGG16模型对热狗数据集进行分类吧:
import torch from torch import nn, optim from torch.utils.data import DataLoader from torchvision import transforms, datasets, models import time device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') train_dir = "../data/hotdog/train" test_dir = "../data/hotdog/test" # 将图像调整为224×224尺寸并归一化 mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] train_augs = transforms.Compose([ transforms.RandomResizedCrop(size=224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean, std) ]) test_augs = transforms.Compose([ transforms.Resize(size=256), transforms.CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize(mean, std) ]) train_set = datasets.ImageFolder(train_dir, transform=train_augs) test_set = datasets.ImageFolder(test_dir, transform=test_augs) batch_size = 32 train_iter = DataLoader(train_set, batch_size=batch_size, shuffle=True) test_iter = DataLoader(test_set, batch_size=batch_size) def train(net, train_iter, test_iter, criterion, optimizer, num_epochs): net = net.to(device) print("training on", device) for epoch in range(num_epochs): start = time.time() net.train() # 训练模式 train_loss_sum, train_acc_sum, n, batch_count = 0.0, 0.0, 0, 0 for X, y in train_iter: X, y = X.to(device), y.to(device) optimizer.zero_grad() # 梯度清零 y_hat = net(X) loss = criterion(y_hat, y) loss.backward() optimizer.step() train_loss_sum += loss.cpu().item() train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item() n += y.shape[0] batch_count += 1 with torch.no_grad(): net.eval() # 评估模式 test_acc_sum, n2 = 0.0, 0 for X, y in test_iter: test_acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item() n2 += y.shape[0] print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec' % (epoch + 1, train_loss_sum / batch_count, train_acc_sum / n, test_acc_sum / n2, time.time() - start)) from vgg import vgg16 net = vgg16(num_classes=2) optimizer = optim.SGD(net.parameters(), lr=0.01) loss = nn.CrossEntropyLoss() train(net, train_iter, test_iter, loss, optimizer, num_epochs=5)
训练过程:
training on cuda
epoch 1, loss 0.6647, train acc 0.583, test acc 0.636, time 43.1 sec
epoch 2, loss 0.5443, train acc 0.750, test acc 0.824, time 43.1 sec
epoch 3, loss 0.4223, train acc 0.809, test acc 0.836, time 43.5 sec
epoch 4, loss 0.4130, train acc 0.822, test acc 0.815, time 43.3 sec
epoch 5, loss 0.3920, train acc 0.830, test acc 0.843, time 43.3 sec
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。