赞
踩
VGG与AlexNet相比,VGG采用小的卷积核和池化层,层数更深,通道数更多,其中每个通道代表着一个FeatureMap,更多的通道数表示更丰富的图像特征。VGG网络第一层的通道数为64,后面每层都进行了翻倍,最多到512个通道,通道数的增加,使得更多的信息可以被提取出来。
对于给定的感受野,VGG可以使用小卷积核代替大卷积核,比如2个3x3的卷积核可以代替一个5x5的卷积核、3个3x3的卷积核可以代替一个7x7的卷积核。
采用堆积的小卷积核优于采用大的卷积核,因为可以增加网络深度来保证学习更复杂的模式,而且代价还比较小(参数更少)。
从上表可以看出,大卷积核带来的特征图和卷积核得参数量并不大,无论是单独去看卷积核参数或者特征图参数,不同kernel大小下这二者加和的结构都是30万的参数量,也就是说,无论大的卷积核还是小的,对参数量来说影响不大甚至持平。
卷积层的参数减少。相比5x5、7x7和11x11的大卷积核,3x3明显地减少了参数量,这点可以回过头去看上面的表格。比方input channel数和output channel数均为C,那么3层conv3x3卷积所需要的卷积层参数是:3x(Cx3x3xC)=27C2,而一层conv7x7卷积所需要的卷积层参数是:Cx7x7xC=49C2。conv7x7的卷积核参数比conv3x3多了(49-27)/27x100% ≈ 81%;
增大的反而是卷积的计算量,在表格中列出了计算量的公式,最后要乘以2,代表乘加操作。为了尽可能证一致,这里所有卷积核使用的stride均为4,可以看到,conv3x3、conv5x5、conv7x7、conv9x9、conv11x11的计算规模依次为:1600万,4500万,1.4亿、2亿,这种规模下的卷积,虽然参数量增长不大,但是计算量是惊人的。
总结一下,我们可以得出两个结论:
其实一个关键的点——多个小卷积核的堆叠比单一大卷积核带来了精度提升,这也是最重要的一点。
采用小卷积核的优点
在上图中加粗的文字是相比前一个网络增加的结构。从上图易知层数越多参数越多,神经网络越复杂。
由上图可得:
A与A-LRN相比,局部响应归一化LRN对上述结构没多大的影响
B与A相比,通过增加卷积数来增加准确率
C与B相比,使用1x1的conv在不影响感受野的情况下增加非线性决策函数,同时C中的第三个与前两个相比增加图像的尺寸抖动(S[min],S[max],min<=max-min)可以增加准确率。
E与D相比,通过增加卷积数来增加top-1准确率但是却增加了top-5错误率
#!/usr/bin/env python # coding: utf-8 # In[ ]: import time import torch import torchvision from torch import nn,optim # In[ ]: device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') # In[17]: print(device) # In[18]: get_ipython().system('nvidia-smi') # In[19]: torch.cuda.get_device_name(0) # # In[ ]: def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'): """Download the fashion mnist dataset and then load into memory.""" trans = [] if resize: trans.append(torchvision.transforms.Resize(size=resize)) # 更改PIL图像的大小 trans.append(torchvision.transforms.ToTensor()) # 将形状为(HxWxC)PIL的图像转为形状为(CxHxW)的FloatTensor transform = torchvision.transforms.Compose(trans) # 一起组成一个变换 mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform) mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform) train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4) test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=4) return train_iter, test_iter # In[ ]: def evaluate_accuracy(data_iter, net, device=None): if device is None and isinstance(net, nn.Module): device = list(net.parameters())[0].device # 运行设备为net所运行的设备 acc_sum, n = 0.0, 0 with torch.no_grad(): # 禁用梯度计算 for X, y in data_iter: if isinstance(net, nn.Module): net.eval() # 不启用 BatchNormalization 和 Dropout acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item() net.train() # 启用 BatchNormalization 和 Dropout else: if ('is training' in net.__code__.co_varnames): acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() else: acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() n += y.shape[0] return acc_sum / n # In[ ]: def train(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs): net = net.to(device) print("training on ", device) loss = torch.nn.CrossEntropyLoss() for epoch in range(num_epochs): train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time() for X, y in train_iter: X = X.to(device) y = y.to(device) y_hat = net(X) l = loss(y_hat, y) optimizer.zero_grad() l.backward() # 把参数通过反向传播来优化 optimizer.step() # 更新参数 train_l_sum += l.cpu().item() train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item() # softMax:y_hat.argmax(dim=1) n += y.shape[0] batch_count += 1 test_acc = evaluate_accuracy(test_iter, net) print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec' % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start)) # In[ ]: def vgg_block(num_convs,in_channels,out_channels): blk=[] for i in range(num_convs): if i==0: blk.append(nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1)) else: blk.append(nn.Conv2d(out_channels,out_channels,kernel_size=3,padding=1)) blk.append(nn.ReLU()) blk.append(nn.MaxPool2d(kernel_size=2,stride=2)) return nn.Sequential(*blk) # conv+relu+maxpool # In[ ]: conv_arch=((1,1,64),(1,64,128),(2,128,256),(2,256,512),(2,512,512)) # 经过5个vgg_block, 宽高会减半5次, 变成 224/32 = 7 fc_features=512*7*7 # c * w * h fc_hidden_units=4096 # In[ ]: class FlattenLayer(nn.Module): def __init__(self): super(FlattenLayer,self).__init__() def forward(self, x): return x.view(x.shape[0],-1) # In[ ]: def vgg(conv_arch,fc_features,fc_hidden_units=4096): net=nn.Sequential() for i , (num_convs,in_channels, out_channels) in enumerate(conv_arch): net.add_module('vgg_block_'+str(i+1),vgg_block(num_convs,in_channels,out_channels)) net.add_module('fc',nn.Sequential(FlattenLayer(), nn.Linear(fc_features,fc_hidden_units), nn.ReLU(), nn.Dropout(0.5), nn.Linear(fc_hidden_units,fc_hidden_units), nn.ReLU(), nn.Dropout(0.5), nn.Linear(fc_hidden_units,10))) return net # In[27]: net=vgg(conv_arch,fc_features,fc_hidden_units) X=torch.rand(1,1,224,224) # print('X:\n',X) # print('X.shape:\n',X.shape) print(net) for name,blk in net.named_children(): X=blk(X) print(name,'output shape:',X.shape) # In[28]: batch_size=64 train_iter,test_iter=load_data_fashion_mnist(batch_size,resize=224) lr,num_epoch=0.001,5 optimizer=torch.optim.Adam(net.parameters(),lr=lr) train(net,train_iter,test_iter,batch_size,optimizer,device,num_epoch) # In[29]: ratio=8 small_conv_arch=[(1,1,64//ratio),(1,64//ratio,128//ratio),(2,128//ratio,256//ratio), (2,256//ratio,512//ratio),(2,512//ratio,512//ratio)] # 减少通道数 net=vgg(small_conv_arch,fc_features//ratio,fc_hidden_units//ratio) print(net) # In[30]: batch_size=64 train_iter,test_iter=load_data_fashion_mnist(batch_size,resize=224) lr,num_epoch=0.001,5 optimizer=torch.optim.Adam(net.parameters(),lr=lr) train(net,train_iter,test_iter,batch_size,optimizer,device,num_epoch)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。