赞
踩
2014年 Google提出的
是和VGG同年出现的,在ILSVRC(ImageNet) 2014中获得冠军,vgg屈居第二
GoogLeNet也称Inception V1。之所以叫GoogLeNet而不叫GoogleNet是为了致敬LeCun的LeNet.
GoogLeNet——含并行连结的网络
在当时取得了非常大的影响, 因为网络的结构变得前所未有, 它颠覆了大家对卷积网络的串联的印象和固定做法, 采用了一种非常有效的 inception 模块, 得到了比 VGG 更深的网络结构, 但是却比 VGG 的参数更少, 因为其去掉了后面的全连接层, 所以参数大大减少, 同时有了很高的计算效率。GoogLeNet 22层,参数量比AlexNet少12倍
它虽然在名字上向LeNet致敬,但在⽹络结构上已经很难看到LeNet的影⼦。 GoogLeNet吸收了NiN中串联⽹络的思想,并在此基础上做了很⼤改进
辅助分类器 Auxiliary Classifier
GoogLeNet用到了辅助分类器。Inception Net一共有22层,除了最后一层的输出结果,中间节点的分类效果也有可能是很好的,所以GoogLeNet将中间某一层的输出作为分类,并以一个较小的权重(0.3)加到最终的分类结果中。一共有2个这样的辅助分类节点。
AlexNet 和 VGG 都只有1个输出层,GoogLeNet 有3个输出层,其中的两个是辅助分类层。
如下图所示,网络主干右边的 两个分支 就是 辅助分类器,其结构一模一样。在训练模型时,将两个辅助分类器的损失乘以权重(论文中是0.3)加到网络的整体损失上,再进行反向传播。
两个辅助分类器的输入分别来自Inception(4a)和Inception(4d)
有没有全连接层
在分类器之前,采用Network in Network中用Averagepool(平均池化)来代替全连接层的思想,而在avg pool之后,还是添加了一个全连接层,是为了大家做finetune(微调)。
而无论是VGG还是LeNet、AlexNet,在输出层方面均是采用连续三个全连接层,全连接层的输入是前面卷积层的输出经过reshape得到。有些地方说googlenet去掉了expensive fc layers是指去掉了前两层计算量大的fc层。
据发现,GoogLeNet将fully-connected layer用avg pooling layer代替后,top-1 accuracy 提高了大约0.6%;然而即使在去除了fully-connected layer后,依然必须dropout。
参数列表
GoogLeNet/Inception 的后续版本
v1: 最早的版本
v2: 加入 batch normalization 加快训练
v3: 对 inception 模块做了调整
v4: 基于 ResNet 加入了 残差连接Inception模块
GoogLeNet中的基础卷积块叫作Inception块,得名于同名电影《盗梦空间》(Inception)。与NiN块相⽐,这个基础块在结构上更加复杂。
其实左边才是Inception原始的结构,右边的应该叫Inception+降维
Inception块⾥有4条并⾏的线路。前3条线路使⽤窗口⼤小分别是1 × 1、 3 ×3和5 × 5的卷积层来抽取不同空间尺⼨下的信息,其中中间2个线路会对输⼊先做1 × 1卷积来减少输⼊通道数,以降低模型复杂度。第四条线路则使⽤3 × 3最⼤池化层,后接1 × 1卷积层来改变通道数。 4条线路都使⽤了合适的填充来使输⼊与输出的⾼和宽⼀致。最后我们将每条线路的输出在通道维上连结,并输⼊接下来的层中去
注意这里四条路出来的feature不是相加,而是concat到一起
Inception模块的核心思想: 利用不同大小的卷积核实现不同尺度的感知,最后进行融合,可以得到图像更好的表征
Inception 结构的主要思路是怎样用密集成分来近似最优的局部稀疏结构。
网络实现(简化版)
Fashion-MNIST数据集
GoogLeNet 可以看作是很多个 Inception 模块的串联。
GoogLeNet将多个设计精细的Inception块和其他层串串联起来。其中Inception块的通道数分配之⽐比是在ImageNet数据集上通过⼤大量量的实验得来的。
GoogLeNet和它的后继者们一度是ImageNet上最⾼高效的模型之⼀:在类似的测试精度下,它们的计算复杂度往往更更低。
代码实现如下图的GoogLenet网络
原论文中使用了多个输出来解决梯度消失的问题, 这里我们只定义一个简单版本的 GoogLeNet, 简化为一个输出
输入的size要是96*96,几通道没关系。不管用CIFAR10还是Fashion-MNIST, 都要resize成96*96的
对于1通道的Fashion-MNIST数据集,size 96*96,我4G显存的GPU的batch_size可以开到32
import torch from torch import nn, optim import torch.nn.functional as F from datetime import datetime import torchvision class inception(nn.Module): def __init__(self, in_channel, out1_1, out2_1, out2_3, out3_1, out3_5, out4_1): super(inception, self).__init__() #默认stride=1,padding=0 # 第一条线路 self.branch1x1 = nn.Sequential( nn.Conv2d(in_channel, out1_1, kernel_size=1), nn.BatchNorm2d(out1_1, eps=1e-3), nn.ReLU(True) ) # 第二条线路 self.branch3x3 = nn.Sequential( # conv_relu(in_channel, out2_1, 1), nn.Conv2d(in_channel, out2_1, kernel_size=1), nn.BatchNorm2d(out2_1, eps=1e-3), nn.ReLU(True), # conv_relu(out2_1, out2_3, 3, padding=1) nn.Conv2d(out2_1, out2_3, kernel_size=3, padding=1), nn.BatchNorm2d(out2_3, eps=1e-3), nn.ReLU(True), ) #第三条线路 self.branch5x5 = nn.Sequential( # conv_relu(in_channel, out3_1, 1), nn.Conv2d(in_channel, out3_1, kernel_size=1), nn.BatchNorm2d(out3_1, eps=1e-3), nn.ReLU(True), # conv_relu(out3_1, out3_5, 5, padding=2) nn.Conv2d(out3_1, out3_5, kernel_size=5, padding=2), nn.BatchNorm2d(out3_5, eps=1e-3), nn.ReLU(True), ) #第四条线路 self.branch_pool = nn.Sequential( nn.MaxPool2d(3, stride=1, padding=1), # conv_relu(in_channel, out4_1, 1) nn.Conv2d(in_channel, out4_1, kernel_size=1), nn.BatchNorm2d(out4_1, eps=1e-3), nn.ReLU(True), ) def forward(self, x): f1 = self.branch1x1(x) f2 = self.branch3x3(x) f3 = self.branch5x5(x) f4 = self.branch_pool(x) output = torch.cat((f1, f2, f3, f4), dim=1) return output # test_net = inception(3, 64, 48, 64, 64, 96, 32) # test_x = torch.tensor(torch.zeros(1, 3, 96, 96)) # print('input shape: {} x {} x {}'.format(test_x.shape[1], test_x.shape[2], test_x.shape[3])) # test_y = test_net(test_x) # print('output shape: {} x {} x {}'.format(test_y.shape[1], test_y.shape[2], test_y.shape[3])) class googlenet(nn.Module): def __init__(self, in_channel, num_classes, verbose=False): super(googlenet, self).__init__() self.verbose = verbose self.block1 = nn.Sequential( # conv_relu(in_channel, out_channel=64, kernel=7, stride=2, padding=3), nn.Conv2d(in_channels=in_channel, out_channels=64, kernel_size=7, stride=2, padding=3), nn.BatchNorm2d(64, eps=1e-3), nn.ReLU(True), nn.MaxPool2d(kernel_size=3, stride=2) ) self.block2 = nn.Sequential( # conv_relu(64, 64, kernel=1), nn.Conv2d(in_channels=64, out_channels=64, kernel_size=1), nn.BatchNorm2d(64, eps=1e-3), nn.ReLU(True), # conv_relu(64, 192, kernel=3, padding=1), nn.Conv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1), nn.BatchNorm2d(192, eps=1e-3), nn.ReLU(True), nn.MaxPool2d(kernel_size=3, stride=2) ) self.block3 = nn.Sequential( inception(192, 64, 96, 128, 16, 32, 32), inception(256, 128, 128, 192, 32, 96, 64), nn.MaxPool2d(kernel_size=3, stride=2) ) self.block4 = nn.Sequential( inception(480, 192, 96, 208, 16, 48, 64), inception(512, 160, 112, 224, 24, 64, 64), inception(512, 128, 128, 256, 24, 64, 64), inception(512, 112, 144, 288, 32, 64, 64), inception(528, 256, 160, 320, 32, 128, 128), nn.MaxPool2d(3, 2) ) self.block5 = nn.Sequential( inception(832, 256, 160, 320, 32, 128, 128), inception(832, 384, 182, 384, 48, 128, 128), nn.AvgPool2d(2) ) self.classifier = nn.Linear(1024, num_classes) def forward(self, x): x = self.block1(x) if self.verbose: print('block 1 output: {}'.format(x.shape)) x = self.block2(x) if self.verbose: print('block 2 output: {}'.format(x.shape)) x = self.block3(x) if self.verbose: print('block 3 output: {}'.format(x.shape)) x = self.block4(x) if self.verbose: print('block 4 output: {}'.format(x.shape)) x = self.block5(x) if self.verbose: print('block 5 output: {}'.format(x.shape)) # print(x.shape) #x是[b,1024,1,1] x = x.view(x.shape[0], -1) #x是[b,1024] x = self.classifier(x) return x def get_acc(output, label): total = output.shape[0] # output是概率,每行概率最高的就是预测值 _, pred_label = output.max(1) num_correct = (pred_label == label).sum().item() return num_correct / total batch_size = 32 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') transform = torchvision.transforms.Compose([ torchvision.transforms.Resize(size=96), torchvision.transforms.ToTensor() ]) train_set = torchvision.datasets.FashionMNIST( root='dataset/', train=True, download=True, transform=transform ) # hand-out留出法划分 train_set, val_set = torch.utils.data.random_split(train_set, [50000, 10000]) test_set = torchvision.datasets.FashionMNIST( root='dataset/', train=False, download=True, transform=transform ) train_loader = torch.utils.data.DataLoader( dataset=train_set, batch_size=batch_size, shuffle=True ) val_loader = torch.utils.data.DataLoader( dataset=val_set, batch_size=batch_size, shuffle=True ) test_loader = torch.utils.data.DataLoader( dataset=test_set, batch_size=batch_size, shuffle=False ) net = googlenet(1,10) lr = 2e-3 optimizer = optim.Adam(net.parameters(), lr=lr) critetion = nn.CrossEntropyLoss() net = net.to(device) prev_time = datetime.now() valid_data = val_loader for epoch in range(3): train_loss = 0 train_acc = 0 net.train() for inputs, labels in train_loader: inputs = inputs.to(device) labels = labels.to(device) # forward outputs = net(inputs) loss = critetion(outputs, labels) # backward optimizer.zero_grad() loss.backward() optimizer.step() train_loss += loss.item() train_acc += get_acc(outputs, labels) # 最后还要求平均的 # 显示时间 cur_time = datetime.now() h, remainder = divmod((cur_time - prev_time).seconds, 3600) m, s = divmod(remainder, 60) # time_str = 'Time %02d:%02d:%02d'%(h,m,s) time_str = 'Time %02d:%02d:%02d(from %02d/%02d/%02d %02d:%02d:%02d to %02d/%02d/%02d %02d:%02d:%02d)' % ( h, m, s, prev_time.year, prev_time.month, prev_time.day, prev_time.hour, prev_time.minute, prev_time.second, cur_time.year, cur_time.month, cur_time.day, cur_time.hour, cur_time.minute, cur_time.second) prev_time = cur_time # validation with torch.no_grad(): net.eval() valid_loss = 0 valid_acc = 0 for inputs, labels in valid_data: inputs = inputs.to(device) labels = labels.to(device) outputs = net(inputs) loss = critetion(outputs, labels) valid_loss += loss.item() valid_acc += get_acc(outputs, labels) print("Epoch %d. Train Loss: %f, Train Acc: %f, Valid Loss: %f, Valid Acc: %f," % (epoch, train_loss / len(train_loader), train_acc / len(train_loader), valid_loss / len(valid_data), valid_acc / len(valid_data)) + time_str) torch.save(net.state_dict(), 'checkpoints/params.pkl') # 测试 with torch.no_grad(): net.eval() correct = 0 total = 0 for (images, labels) in test_loader: images, labels = images.to(device), labels.to(device) output = net(images) _, predicted = torch.max(output.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print("The accuracy of total {} images: {}%".format(total, 100 * correct / total))可以看到输入的尺寸不断减小, 通道的维度不断增加
flower数据集
http://download.tensorflow.org/example_images/flower_photos.tgz
下载及解压完后应该是这样
数据集进行分类+处理
import os from shutil import copy import random def mkfile(file): if not os.path.exists(file): os.makedirs(file) root_dir = 'dataset/flower/' file_dir = root_dir + 'flower_photos/' train_dir = root_dir + 'train/' val_dir = root_dir + 'val/' flower_class = [cla for cla in os.listdir(file_dir) if ".txt" not in cla] mkfile(train_dir) mkfile(val_dir) for cla in flower_class: mkfile(train_dir + cla) for cla in flower_class: mkfile(val_dir + cla) split_rate = 0.1 for cla in flower_class: cla_path = file_dir + cla + '/' images = os.listdir(cla_path) num = len(images) eval_index = random.sample(images, k=int(num*split_rate)) for index, image in enumerate(images): if image in eval_index: image_path = cla_path + image new_path = val_dir + cla copy(image_path, new_path) else: image_path = cla_path + image new_path = train_dir + cla copy(image_path, new_path) print("\r[{}] processing [{}/{}]".format(cla, index+1, num), end="") # processing bar print() print("processing done!")还要带一个class_indices.json
这个是程序带着的,不是flower数据集中带着的
网络实现
import torch import torch.nn as nn from torchvision import transforms, datasets import json import torch.optim as optim import torch.nn.functional as F class BasicConv2d(nn.Module): def __init__(self, in_channels, out_channels, **kwargs): super(BasicConv2d, self).__init__() self.conv = nn.Conv2d(in_channels, out_channels, **kwargs) self.relu = nn.ReLU(inplace=True) def forward(self, x): x = self.conv(x) x = self.relu(x) return x class Inception(nn.Module): def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj): super(Inception, self).__init__() self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1) self.branch2 = nn.Sequential( BasicConv2d(in_channels, ch3x3red, kernel_size=1), BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1) # 保证输出大小等于输入大小 ) self.branch3 = nn.Sequential( BasicConv2d(in_channels, ch5x5red, kernel_size=1), BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2) # 保证输出大小等于输入大小 ) self.branch4 = nn.Sequential( nn.MaxPool2d(kernel_size=3, stride=1, padding=1), BasicConv2d(in_channels, pool_proj, kernel_size=1) ) def forward(self, x): branch1 = self.branch1(x) branch2 = self.branch2(x) branch3 = self.branch3(x) branch4 = self.branch4(x) outputs = [branch1, branch2, branch3, branch4] return torch.cat(outputs, 1) class GoogLeNet(nn.Module): def __init__(self, num_classes=1000, aux_logits=True, init_weights=False): super(GoogLeNet, self).__init__() self.aux_logits = aux_logits self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3) self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True) self.conv2 = BasicConv2d(64, 64, kernel_size=1) self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1) self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True) self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32) self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64) self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True) self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64) self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64) self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64) self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64) self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128) self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True) self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128) self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128) if self.aux_logits: self.aux1 = InceptionAux(512, num_classes) self.aux2 = InceptionAux(528, num_classes) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.dropout = nn.Dropout(0.4) self.fc = nn.Linear(1024, num_classes) if init_weights: self._initialize_weights() def forward(self, x): # N x 3 x 224 x 224 x = self.conv1(x) # N x 64 x 112 x 112 x = self.maxpool1(x) # N x 64 x 56 x 56 x = self.conv2(x) # N x 64 x 56 x 56 x = self.conv3(x) # N x 192 x 56 x 56 x = self.maxpool2(x) # N x 192 x 28 x 28 x = self.inception3a(x) # N x 256 x 28 x 28 x = self.inception3b(x) # N x 480 x 28 x 28 x = self.maxpool3(x) # N x 480 x 14 x 14 x = self.inception4a(x) # N x 512 x 14 x 14 if self.training and self.aux_logits: # eval model lose this layer aux1 = self.aux1(x) x = self.inception4b(x) # N x 512 x 14 x 14 x = self.inception4c(x) # N x 512 x 14 x 14 x = self.inception4d(x) # N x 528 x 14 x 14 if self.training and self.aux_logits: # eval model lose this layer aux2 = self.aux2(x) x = self.inception4e(x) # N x 832 x 14 x 14 x = self.maxpool4(x) # N x 832 x 7 x 7 x = self.inception5a(x) # N x 832 x 7 x 7 x = self.inception5b(x) # N x 1024 x 7 x 7 x = self.avgpool(x) # N x 1024 x 1 x 1 x = torch.flatten(x, 1) # N x 1024 x = self.dropout(x) x = self.fc(x) # N x 1000 (num_classes) if self.training and self.aux_logits: # eval model lose this layer return x, aux2, aux1 return x def _initialize_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.Linear): nn.init.normal_(m.weight, 0, 0.01) nn.init.constant_(m.bias, 0) class InceptionAux(nn.Module): def __init__(self, in_channels, num_classes): super(InceptionAux, self).__init__() self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3) self.conv = BasicConv2d(in_channels, 128, kernel_size=1) # output[batch, 128, 4, 4] self.fc1 = nn.Linear(2048, 1024) self.fc2 = nn.Linear(1024, num_classes) def forward(self, x): # aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14 x = self.averagePool(x) # aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4 x = self.conv(x) # N x 128 x 4 x 4 x = torch.flatten(x, 1) x = F.dropout(x, 0.5, training=self.training) # N x 2048 x = F.relu(self.fc1(x), inplace=True) x = F.dropout(x, 0.5, training=self.training) # N x 1024 x = self.fc2(x) # N x num_classes return x device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") data_transform = { "train": transforms.Compose([transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))] ), "val": transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))] ) } root = 'dataset/flower/' # flower data set path train_dataset = datasets.ImageFolder( root=root + "train", transform=data_transform["train"] ) train_num = len(train_dataset) # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4} flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) # write dict into json file json_str = json.dumps(cla_dict, indent=4) with open(root + 'class_indices.json', 'w') as json_file: json_file.write(json_str) batch_size = 32 train_loader = torch.utils.data.DataLoader( dataset=train_dataset, batch_size=batch_size, shuffle=True, num_workers=0 ) validate_dataset = datasets.ImageFolder( root=root + "val", transform=data_transform["val"] ) val_num = len(validate_dataset) validate_loader = torch.utils.data.DataLoader( dataset=validate_dataset, batch_size=batch_size, shuffle=False, num_workers=0 ) net = GoogLeNet(num_classes=5, aux_logits=True, init_weights=True) net.to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=3e-4) best_acc = 0.0 save_path = 'checkpoints/googleNet.pth' for epoch in range(6): # train net.train() running_loss = 0.0 for step, data in enumerate(train_loader, start=0): images, labels = data optimizer.zero_grad() logits, aux_logits2, aux_logits1 = net(images.to(device)) loss0 = criterion(logits, labels.to(device)) loss1 = criterion(aux_logits1, labels.to(device)) loss2 = criterion(aux_logits2, labels.to(device)) loss = loss0 + loss1 * 0.3 + loss2 * 0.3 loss.backward() optimizer.step() # print statistics running_loss += loss.item() # print train process rate = (step + 1) / len(train_loader) a = "*" * int(rate * 50) b = "." * int((1 - rate) * 50) print("\rtrain loss: {:^3.0f}%[{}->{}]{:.3f}".format(int(rate * 100), a, b, loss), end="") print() # validate net.eval() acc = 0.0 # accumulate accurate number / epoch with torch.no_grad(): for data_test in validate_loader: test_images, test_labels = data_test outputs = net(test_images.to(device)) # eval model only have last output layer predict_y = torch.max(outputs, dim=1)[1] acc += (predict_y == test_labels.to(device)).sum().item() accurate_test = acc / val_num if accurate_test > best_acc: torch.save(net.state_dict(), save_path) print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' % (epoch + 1, running_loss / step, accurate_test)) print('Finished Training')predict
predict部分没和主程序写在一起是因为predict的时候就不用辅助分类器的输出了,所以网络结构就变了
所以只能是等训练完模型参数然后再载入进来,这里的载入还是strict=False
import torch from model import GoogLeNet from PIL import Image from torchvision import transforms import matplotlib.pyplot as plt import json data_transform = transforms.Compose( [transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # load image # img = Image.open("../tulip.jpg") img = Image.open("dataset/flower/val/tulips/10791227_7168491604.jpg") plt.imshow(img) # [N, C, H, W] img = data_transform(img) # expand batch dimension img = torch.unsqueeze(img, dim=0) # read class_indict try: json_file = open('dataset/flower/class_indices.json', 'r') class_indict = json.load(json_file) except Exception as e: print(e) exit(-1) # create model model = GoogLeNet(num_classes=5, aux_logits=False) # load model weights model_weight_path = "checkpoints/googleNet.pth" missing_keys, unexpected_keys = model.load_state_dict(torch.load(model_weight_path), strict=False) model.eval() with torch.no_grad(): # predict class output = torch.squeeze(model(img)) predict = torch.softmax(output, dim=0) predict_cla = torch.argmax(predict).numpy() print(class_indict[str(predict_cla)]) plt.show()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。