赞
踩
2015年 何恺明在微软亚洲研究院提出的
2015 ImageNet ILSVRC 冠军
ResNet 主要有五种:ResNet18、ResNet34、ResNet50、ResNet101、ResNet152几种。
其中,ResNet-18和ResNet-34的基本结构相同,属于相对浅层的网络;后面3种的基本结构不同于ResNet-18和ResNet-34,属于更深层的网络。
深层网络表现不好的问题
ResNet网络结构主要参考了VGG19网络,在其基础上通过短路连接加上了残差单元
ResNet 有效地解决了深度神经网络难以训练的问题, 可以训练高达 1000 层的卷积网络。 网络之所以难以训练, 是因为存在着梯度消失的问题, 离 loss 函数越远的层, 在反向传播的时候, 梯度越小, 就越难以更新, 随着层数的增加, 这个现象越严重。 之前有两种常见的方案来解决这个问题:
- 按层训练, 先训练比较浅的层, 然后在不断增加层数, 但是这种方法效果不是特别好, 而且比较麻烦
- 使用更宽的层, 或者增加输出通道, 而不加深网络的层数, 这种结构往往得到的效果又不好
ResNet 通过引入了跨层链接解决了梯度回传消失的问题
Residual block
使用普通的连接, 上层的梯度必须要一层一层传回来, 而是用残差连接, 相当于中间有了一条更短的路, 梯度能够从这条更短的路传回来, 避免了梯度过小的情况。
我们的目标是想学H(x), 但是我们看到,当网络层数变深的时候,学到一个好的H(x)是困难的。所以不如把它拆分一下,因为H(x) = F(x)+x, 我们学F(x), 即H(x)-x, 而不是直接学H(x)。F(x)就是所谓的残差。
右边的shortcut叫做identity是取自identity mapping恒等映射
注意这里的shortcut是真的加上去,而不是concat
不同深度的ResNet网络的架构参数描述表
输入大小是224*224
可以看到,无论是resnet18还是34还是152,它们的输出size都是一样的
即resnet都是对原输入的32倍下采样,然后在用avgpooling拍平
resnet50,其C5的输出是对原图size的32倍下采样
resnet101也是,resnet几都是因为不管多少层的resnet都是5个conv块,这个conv块就是按照h,w size划分的
不同层的resnet的区别在于各个conv块的channel数和conv层数不同
对于50层+的Resnet,使用了bottleneck层来提升efficiency(和GoogLeNet)
训练ResNet
- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout usedResNet的特性
①如果我们把residual block中的weight全部置0,那么它就相当于identity。
所以对于残差网络来说,模型可以在不需要的时候,不去使用某一层
②普通的卷积block与深度残差网络的最大区别在于,深度残差网络有很多旁路的支线将输入直接连到后面的层,使得后面的层可以直接学习残差,这些支路就叫做shortcut。
传统的卷积层或全连接层在信息传递时,或多或少会存在信息丢失、损耗等问题。ResNet 在某种程度上解决了这个问题,通过直接将输入信息绕道传到输出,保护信息的完整性,整个网络则只需要学习输入、输出差别的那一部分,简化学习目标和难度。
③在反向传播过程中的梯度flow
当上层的梯度反向传播到一个addition gate的时候,他将会split,会形成两个分支往下走。所以残差连接的存在,相当于给了一条梯度反向传播的高速公路。因为网络能够训练地easier and faster.
残差结构使得梯度反向传播时,更不易出现梯度消失等问题,由于Skip Connection的存在,梯度能畅通无阻地通过各个Res blocks
ResNet34
最后也用到了GAP层。除了FC1000外没有其他的fc层
我们的输入图像是224x224,首先通过1个卷积层,接着通过4个残差层,最后通过Softmax之中输出一个1000维的向量,代表ImageNet的1000个分类
标准的Resnet接受的数据是224*224
import torch from torch import nn, optim import torch.nn.functional as F from datetime import datetime import torchvision #实现子module: Residual Block class ResidualBlock(nn.Module): def __init__(self, inchannel, outchannel, stride=1, shortcut=None): super(ResidualBlock, self).__init__() #left就是F(x) self.left = nn.Sequential( nn.Conv2d(in_channels=inchannel, out_channels=outchannel, kernel_size=3, stride=stride, padding=1, bias=False), nn.BatchNorm2d(outchannel), nn.ReLU(inplace=True), nn.Conv2d(in_channels=outchannel, out_channels=outchannel, kernel_size=3, stride=1, padding=1, bias=False), nn.BatchNorm2d(outchannel) ) self.right = shortcut def forward(self, x): out = self.left(x) if self.right: residual = self.right(x) else: residual = x out += residual return F.relu(out) class ResNet(nn.Module): ''' 实现主module:ResNet34 ResNet34 包含多个layer,每个layer又包含多个residual block 用子module来实现residual block,用_make_layer函数来实现layer ''' def __init__(self, num_classes=1000): super(ResNet, self).__init__() # 前几层图像转换 self.pre = nn.Sequential( nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3, bias=False), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) # 重复的layer,分别有3,4,6,3个residual block self.layer1 = self._make_layer(64, 64, 3) self.layer2 = self._make_layer(64, 128, 4, stride=2) self.layer3 = self._make_layer(128, 256, 6, stride=2) self.layer4 = self._make_layer(256, 512, 3, stride=2) # 分类用的全连接 self.fc = nn.Linear(512, num_classes) #构建layer,包含多个residual block def _make_layer(self, inchannel, outchannel, block_num, stride=1): #shortcut就是self.right shortcut = nn.Sequential( nn.Conv2d(inchannel, outchannel, 1, stride, bias=False), nn.BatchNorm2d(outchannel) ) #shortcut定义成这样就表示,正常主干是两层conv,这里只是一层,比主干少了一层 layers = [] layers.append(ResidualBlock(inchannel, outchannel, stride, shortcut)) for i in range(1, block_num): layers.append(ResidualBlock(outchannel, outchannel)) return nn.Sequential(*layers) def forward(self, x): x = self.pre(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = F.avg_pool2d(x, 7) x = x.view(x.size(0), -1) return self.fc(x) def get_acc(output, label): total = output.shape[0] # output是概率,每行概率最高的就是预测值 _, pred_label = output.max(1) num_correct = (pred_label == label).sum().item() return num_correct / total batch_size = 32 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') transform = torchvision.transforms.Compose([ torchvision.transforms.Resize(size=224), torchvision.transforms.ToTensor() ]) train_set = torchvision.datasets.CIFAR10( root='dataset/', train=True, download=True, transform=transform ) # hand-out留出法划分 train_set, val_set = torch.utils.data.random_split(train_set, [40000, 10000]) test_set = torchvision.datasets.CIFAR10( root='dataset/', train=False, download=True, transform=transform ) train_loader = torch.utils.data.DataLoader( dataset=train_set, batch_size=batch_size, shuffle=True ) val_loader = torch.utils.data.DataLoader( dataset=val_set, batch_size=batch_size, shuffle=True ) test_loader = torch.utils.data.DataLoader( dataset=test_set, batch_size=batch_size, shuffle=False ) net = ResNet(num_classes=10) lr = 2e-3 optimizer = optim.Adam(net.parameters(), lr=lr) critetion = nn.CrossEntropyLoss() net = net.to(device) prev_time = datetime.now() valid_data = val_loader for epoch in range(3): train_loss = 0 train_acc = 0 net.train() for inputs, labels in train_loader: inputs = inputs.to(device) labels = labels.to(device) # forward outputs = net(inputs) loss = critetion(outputs, labels) # backward optimizer.zero_grad() loss.backward() optimizer.step() train_loss += loss.item() train_acc += get_acc(outputs, labels) # 最后还要求平均的 # 显示时间 cur_time = datetime.now() h, remainder = divmod((cur_time - prev_time).seconds, 3600) m, s = divmod(remainder, 60) # time_str = 'Time %02d:%02d:%02d'%(h,m,s) time_str = 'Time %02d:%02d:%02d(from %02d/%02d/%02d %02d:%02d:%02d to %02d/%02d/%02d %02d:%02d:%02d)' % ( h, m, s, prev_time.year, prev_time.month, prev_time.day, prev_time.hour, prev_time.minute, prev_time.second, cur_time.year, cur_time.month, cur_time.day, cur_time.hour, cur_time.minute, cur_time.second) prev_time = cur_time # validation with torch.no_grad(): net.eval() valid_loss = 0 valid_acc = 0 for inputs, labels in valid_data: inputs = inputs.to(device) labels = labels.to(device) outputs = net(inputs) loss = critetion(outputs, labels) valid_loss += loss.item() valid_acc += get_acc(outputs, labels) print("Epoch %d. Train Loss: %f, Train Acc: %f, Valid Loss: %f, Valid Acc: %f," % (epoch, train_loss / len(train_loader), train_acc / len(train_loader), valid_loss / len(valid_data), valid_acc / len(valid_data)) + time_str) torch.save(net.state_dict(), 'checkpoints/params.pkl') # 测试 with torch.no_grad(): net.eval() correct = 0 total = 0 for (images, labels) in test_loader: images, labels = images.to(device), labels.to(device) output = net(images) _, predicted = torch.max(output.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print("The accuracy of total {} images: {}%".format(total, 100 * correct / total))
ResNet的model在Pytorch中可以直接用
from torchvision import models model = models.resnet34()输入是 [bs,3,224,224]
ResNet18
resnet.py
import torch from torch import nn from torch.nn import functional as F class ResBlk(nn.Module): #resnet block def __init__(self, ch_in, ch_out, stride=1): super(ResBlk, self).__init__() self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=stride, padding=1) self.bn1 = nn.BatchNorm2d(ch_out) self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1) self.bn2 = nn.BatchNorm2d(ch_out) self.extra = nn.Sequential() if ch_out != ch_in: #[b,ch_in,h,w] => [b,ch_out,h,w] self.extra = nn.Sequential( nn.Conv2d(ch_in,ch_out,kernel_size=1, stride=stride), nn.BatchNorm2d(ch_out) ) def forward(self,x): # x: [b,ch,h,w] out = F.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) #short cut #extra module: [b, ch_in, h,w] => [b,ch_out,h,w] #element-wise add: out = self.extra(x) + out return out class ResNet18(nn.Module): def __init__(self): super(ResNet18, self).__init__() self.conv1 = nn.Sequential( nn.Conv2d(3,64,kernel_size=3, stride=3, padding=0), nn.BatchNorm2d(64) ) # followed 4 blocks # [b, 64, h, w] => [b, 128, h ,w] self.blk1 = ResBlk(64, 128, stride=2) # [b, 128, h, w] => [b, 256, h, w] self.blk2 = ResBlk(128, 256,stride=2) # # [b, 256, h, w] => [b, 512, h, w] self.blk3 = ResBlk(256, 512, stride=2) # # [b, 512, h, w] => [b, 1024, h, w] self.blk4 = ResBlk(512,512,stride=2) self.outlayer = nn.Linear(512*1*1,10) def forward(self, x): x = F.relu(self.conv1(x)) # [b, 64, h, w] => [b, 1024, h, w] x = self.blk1(x) x = self.blk2(x) x = self.blk3(x) x = self.blk4(x) # print('after conv:',x.shape) #[b,512,2,2] #[b,512,h,w] => [b,512,1,1] x = F.adaptive_avg_pool2d(x, [1,1]) # print('after pool:', x.shape) x = x.view(x.size(0), -1) x = self.outlayer(x) return x def main(): blk = ResBlk(64, 128, stride=2) tmp = torch.rand(2,64,32,32) out = blk(tmp) print('block:', out.shape) x = torch.rand(2,3,32,32) model = ResNet18() out = model(x) print('resnet:', out.shape) if __name__ == '__main__': main()main.py
import torch from torch.utils.data import DataLoader from torchvision import datasets from torchvision import transforms from torch import nn,optim #from lenet5 import Lenet5 from resnet import ResNet18 def main(): batchsz = 32 cifar_train = datasets.CIFAR10('dataset/', train=True, transform=transforms.Compose([ transforms.Resize((32, 32)), transforms.ToTensor(), transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]) ]), download=True) cifar_train = DataLoader(cifar_train, batch_size=batchsz, shuffle=True) cifar_test = datasets.CIFAR10('dataset/', train=False, transform=transforms.Compose([ transforms.Resize((32, 32)), transforms.ToTensor(), transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]) ]), download=True) cifar_test = DataLoader(cifar_test, batch_size=batchsz, shuffle=True) x, label = iter(cifar_train).next() print('x:', x.shape, 'label:', label.shape) device = torch.device('cuda') #model = Lenet5().to(device) model = ResNet18().to(device) criton = nn.CrossEntropyLoss().to(device) #包含了softmax optimizer = optim.Adam(model.parameters(),lr=1e-3) print(model) for epoch in range(1000): model.train() for batchidx, (x,label) in enumerate(cifar_train): #x:[b,3,32,32] #label: [b] x,label = x.to(device), label.to(device) logits=model(x) #logits:[b,10] #label:b[b] #loss: tensor scalar 长度为0的标量 loss = criton(logits,label) #反向传播 optimizer.zero_grad() #梯度清零 loss.backward() #得到新的梯度 optimizer.step() #走一步,把梯度更新到weight里面去了 print(epoch, ' loss: ',loss.item()) model.eval() #test total_correct = 0 total_num = 0 for x, label in cifar_test: # x:[b,3,32,32] # label: [b] x,label = x.to(device) ,label.to(device) #logits:[b,10] logits = model(x) pred = logits.argmax(dim=1) total_correct += torch.eq(pred,label).float().sum().item() total_num += x.size(0) #即batch_size acc = total_correct / total_num print(epoch, ' acc: ',acc) if __name__ == '__main__': main()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。