当前位置:   article > 正文

Pytorch之经典神经网络CNN(八) —— ResNet(GAP全局池化层)(bottleneck)(CIFAR-10)_bottleneck 卷积网络

bottleneck 卷积网络

2015年 何恺明在微软亚洲研究院提出的

2015 ImageNet ILSVRC 冠军


 

ResNet 主要有五种:ResNet18、ResNet34、ResNet50、ResNet101、ResNet152几种。

其中,ResNet-18和ResNet-34的基本结构相同,属于相对浅层的网络;后面3种的基本结构不同于ResNet-18和ResNet-34,属于更深层的网络。

深层网络表现不好的问题

 ResNet网络结构主要参考了VGG19网络,在其基础上通过短路连接加上了残差单元

ResNet 有效地解决了深度神经网络难以训练的问题, 可以训练高达 1000 层的卷积网络网络之所以难以训练, 是因为存在着梯度消失的问题, 离 loss 函数越远的层, 在反向传播的时候, 梯度越小, 就越难以更新, 随着层数的增加, 这个现象越严重。 之前有两种常见的方案来解决这个问题:

  1. 按层训练, 先训练比较浅的层, 然后在不断增加层数, 但是这种方法效果不是特别好, 而且比较麻烦
  2. 使用更宽的层, 或者增加输出通道, 而不加深网络的层数, 这种结构往往得到的效果又不好

ResNet 通过引入了跨层链接解决了梯度回传消失的问题

Residual block

 使用普通的连接, 上层的梯度必须要一层一层传回来, 而是用残差连接, 相当于中间有了一条更短的路, 梯度能够从这条更短的路传回来, 避免了梯度过小的情况。

我们的目标是想学H(x),  但是我们看到,当网络层数变深的时候,学到一个好的H(x)是困难的。所以不如把它拆分一下,因为H(x) = F(x)+x, 我们学F(x), 即H(x)-x, 而不是直接学H(x)。F(x)就是所谓的残差。

右边的shortcut叫做identity是取自identity mapping恒等映射

注意这里的shortcut是真的加上去,而不是concat
 

不同深度的ResNet网络的架构参数描述表

输入大小是224*224

可以看到,无论是resnet18还是34还是152,它们的输出size都是一样的

即resnet都是对原输入的32倍下采样,然后在用avgpooling拍平

resnet50,其C5的输出是对原图size的32倍下采样
resnet101也是,resnet几都是

因为不管多少层的resnet都是5个conv块,这个conv块就是按照h,w size划分的

不同层的resnet的区别在于各个conv块的channel数和conv层数不同

对于50层+的Resnet,使用了bottleneck层来提升efficiency(和GoogLeNet)

训练ResNet

- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout used

ResNet的特性

①如果我们把residual block中的weight全部置0,那么它就相当于identity。

所以对于残差网络来说,模型可以在不需要的时候,不去使用某一层

②普通的卷积block与深度残差网络的最大区别在于,深度残差网络有很多旁路的支线将输入直接连到后面的层,使得后面的层可以直接学习残差,这些支路就叫做shortcut。

传统的卷积层或全连接层在信息传递时,或多或少会存在信息丢失、损耗等问题。ResNet 在某种程度上解决了这个问题,通过直接将输入信息绕道传到输出,保护信息的完整性,整个网络则只需要学习输入、输出差别的那一部分,简化学习目标和难度。

③在反向传播过程中的梯度flow

当上层的梯度反向传播到一个addition gate的时候,他将会split,会形成两个分支往下走。所以残差连接的存在,相当于给了一条梯度反向传播的高速公路。因为网络能够训练地easier and faster.

残差结构使得梯度反向传播时,更不易出现梯度消失等问题,由于Skip Connection的存在,梯度能畅通无阻地通过各个Res blocks

ResNet34

最后也用到了GAP层。除了FC1000外没有其他的fc层

我们的输入图像是224x224,首先通过1个卷积层,接着通过4个残差层,最后通过Softmax之中输出一个1000维的向量,代表ImageNet的1000个分类

标准的Resnet接受的数据是224*224

  1. import torch
  2. from torch import nn, optim
  3. import torch.nn.functional as F
  4. from datetime import datetime
  5. import torchvision
  6. #实现子module: Residual Block
  7. class ResidualBlock(nn.Module):
  8. def __init__(self, inchannel, outchannel, stride=1, shortcut=None):
  9. super(ResidualBlock, self).__init__()
  10. #left就是F(x)
  11. self.left = nn.Sequential(
  12. nn.Conv2d(in_channels=inchannel, out_channels=outchannel, kernel_size=3, stride=stride, padding=1, bias=False),
  13. nn.BatchNorm2d(outchannel),
  14. nn.ReLU(inplace=True),
  15. nn.Conv2d(in_channels=outchannel, out_channels=outchannel, kernel_size=3, stride=1, padding=1, bias=False),
  16. nn.BatchNorm2d(outchannel)
  17. )
  18. self.right = shortcut
  19. def forward(self, x):
  20. out = self.left(x)
  21. if self.right:
  22. residual = self.right(x)
  23. else:
  24. residual = x
  25. out += residual
  26. return F.relu(out)
  27. class ResNet(nn.Module):
  28. '''
  29. 实现主module:ResNet34
  30. ResNet34 包含多个layer,每个layer又包含多个residual block
  31. 用子module来实现residual block,用_make_layer函数来实现layer
  32. '''
  33. def __init__(self, num_classes=1000):
  34. super(ResNet, self).__init__()
  35. # 前几层图像转换
  36. self.pre = nn.Sequential(
  37. nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3, bias=False),
  38. nn.BatchNorm2d(64),
  39. nn.ReLU(inplace=True),
  40. nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
  41. )
  42. # 重复的layer,分别有3,4,6,3个residual block
  43. self.layer1 = self._make_layer(64, 64, 3)
  44. self.layer2 = self._make_layer(64, 128, 4, stride=2)
  45. self.layer3 = self._make_layer(128, 256, 6, stride=2)
  46. self.layer4 = self._make_layer(256, 512, 3, stride=2)
  47. # 分类用的全连接
  48. self.fc = nn.Linear(512, num_classes)
  49. #构建layer,包含多个residual block
  50. def _make_layer(self, inchannel, outchannel, block_num, stride=1):
  51. #shortcut就是self.right
  52. shortcut = nn.Sequential(
  53. nn.Conv2d(inchannel, outchannel, 1, stride, bias=False),
  54. nn.BatchNorm2d(outchannel)
  55. ) #shortcut定义成这样就表示,正常主干是两层conv,这里只是一层,比主干少了一层
  56. layers = []
  57. layers.append(ResidualBlock(inchannel, outchannel, stride, shortcut))
  58. for i in range(1, block_num):
  59. layers.append(ResidualBlock(outchannel, outchannel))
  60. return nn.Sequential(*layers)
  61. def forward(self, x):
  62. x = self.pre(x)
  63. x = self.layer1(x)
  64. x = self.layer2(x)
  65. x = self.layer3(x)
  66. x = self.layer4(x)
  67. x = F.avg_pool2d(x, 7)
  68. x = x.view(x.size(0), -1)
  69. return self.fc(x)
  70. def get_acc(output, label):
  71. total = output.shape[0]
  72. # output是概率,每行概率最高的就是预测值
  73. _, pred_label = output.max(1)
  74. num_correct = (pred_label == label).sum().item()
  75. return num_correct / total
  76. batch_size = 32
  77. device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
  78. transform = torchvision.transforms.Compose([
  79. torchvision.transforms.Resize(size=224),
  80. torchvision.transforms.ToTensor()
  81. ])
  82. train_set = torchvision.datasets.CIFAR10(
  83. root='dataset/',
  84. train=True,
  85. download=True,
  86. transform=transform
  87. )
  88. # hand-out留出法划分
  89. train_set, val_set = torch.utils.data.random_split(train_set, [40000, 10000])
  90. test_set = torchvision.datasets.CIFAR10(
  91. root='dataset/',
  92. train=False,
  93. download=True,
  94. transform=transform
  95. )
  96. train_loader = torch.utils.data.DataLoader(
  97. dataset=train_set,
  98. batch_size=batch_size,
  99. shuffle=True
  100. )
  101. val_loader = torch.utils.data.DataLoader(
  102. dataset=val_set,
  103. batch_size=batch_size,
  104. shuffle=True
  105. )
  106. test_loader = torch.utils.data.DataLoader(
  107. dataset=test_set,
  108. batch_size=batch_size,
  109. shuffle=False
  110. )
  111. net = ResNet(num_classes=10)
  112. lr = 2e-3
  113. optimizer = optim.Adam(net.parameters(), lr=lr)
  114. critetion = nn.CrossEntropyLoss()
  115. net = net.to(device)
  116. prev_time = datetime.now()
  117. valid_data = val_loader
  118. for epoch in range(3):
  119. train_loss = 0
  120. train_acc = 0
  121. net.train()
  122. for inputs, labels in train_loader:
  123. inputs = inputs.to(device)
  124. labels = labels.to(device)
  125. # forward
  126. outputs = net(inputs)
  127. loss = critetion(outputs, labels)
  128. # backward
  129. optimizer.zero_grad()
  130. loss.backward()
  131. optimizer.step()
  132. train_loss += loss.item()
  133. train_acc += get_acc(outputs, labels)
  134. # 最后还要求平均的
  135. # 显示时间
  136. cur_time = datetime.now()
  137. h, remainder = divmod((cur_time - prev_time).seconds, 3600)
  138. m, s = divmod(remainder, 60)
  139. # time_str = 'Time %02d:%02d:%02d'%(h,m,s)
  140. time_str = 'Time %02d:%02d:%02d(from %02d/%02d/%02d %02d:%02d:%02d to %02d/%02d/%02d %02d:%02d:%02d)' % (
  141. h, m, s, prev_time.year, prev_time.month, prev_time.day, prev_time.hour, prev_time.minute, prev_time.second,
  142. cur_time.year, cur_time.month, cur_time.day, cur_time.hour, cur_time.minute, cur_time.second)
  143. prev_time = cur_time
  144. # validation
  145. with torch.no_grad():
  146. net.eval()
  147. valid_loss = 0
  148. valid_acc = 0
  149. for inputs, labels in valid_data:
  150. inputs = inputs.to(device)
  151. labels = labels.to(device)
  152. outputs = net(inputs)
  153. loss = critetion(outputs, labels)
  154. valid_loss += loss.item()
  155. valid_acc += get_acc(outputs, labels)
  156. print("Epoch %d. Train Loss: %f, Train Acc: %f, Valid Loss: %f, Valid Acc: %f,"
  157. % (epoch, train_loss / len(train_loader), train_acc / len(train_loader), valid_loss / len(valid_data),
  158. valid_acc / len(valid_data))
  159. + time_str)
  160. torch.save(net.state_dict(), 'checkpoints/params.pkl')
  161. # 测试
  162. with torch.no_grad():
  163. net.eval()
  164. correct = 0
  165. total = 0
  166. for (images, labels) in test_loader:
  167. images, labels = images.to(device), labels.to(device)
  168. output = net(images)
  169. _, predicted = torch.max(output.data, 1)
  170. total += labels.size(0)
  171. correct += (predicted == labels).sum().item()
  172. print("The accuracy of total {} images: {}%".format(total, 100 * correct / total))

ResNet的model在Pytorch中可以直接用

  1. from torchvision import models
  2. model = models.resnet34()

输入是 [bs,3,224,224]

ResNet18

resnet.py

  1. import torch
  2. from torch import nn
  3. from torch.nn import functional as F
  4. class ResBlk(nn.Module):
  5. #resnet block
  6. def __init__(self, ch_in, ch_out, stride=1):
  7. super(ResBlk, self).__init__()
  8. self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=stride, padding=1)
  9. self.bn1 = nn.BatchNorm2d(ch_out)
  10. self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1)
  11. self.bn2 = nn.BatchNorm2d(ch_out)
  12. self.extra = nn.Sequential()
  13. if ch_out != ch_in:
  14. #[b,ch_in,h,w] => [b,ch_out,h,w]
  15. self.extra = nn.Sequential(
  16. nn.Conv2d(ch_in,ch_out,kernel_size=1, stride=stride),
  17. nn.BatchNorm2d(ch_out)
  18. )
  19. def forward(self,x):
  20. # x: [b,ch,h,w]
  21. out = F.relu(self.bn1(self.conv1(x)))
  22. out = self.bn2(self.conv2(out))
  23. #short cut
  24. #extra module: [b, ch_in, h,w] => [b,ch_out,h,w]
  25. #element-wise add:
  26. out = self.extra(x) + out
  27. return out
  28. class ResNet18(nn.Module):
  29. def __init__(self):
  30. super(ResNet18, self).__init__()
  31. self.conv1 = nn.Sequential(
  32. nn.Conv2d(3,64,kernel_size=3, stride=3, padding=0),
  33. nn.BatchNorm2d(64)
  34. )
  35. # followed 4 blocks
  36. # [b, 64, h, w] => [b, 128, h ,w]
  37. self.blk1 = ResBlk(64, 128, stride=2)
  38. # [b, 128, h, w] => [b, 256, h, w]
  39. self.blk2 = ResBlk(128, 256,stride=2)
  40. # # [b, 256, h, w] => [b, 512, h, w]
  41. self.blk3 = ResBlk(256, 512, stride=2)
  42. # # [b, 512, h, w] => [b, 1024, h, w]
  43. self.blk4 = ResBlk(512,512,stride=2)
  44. self.outlayer = nn.Linear(512*1*1,10)
  45. def forward(self, x):
  46. x = F.relu(self.conv1(x))
  47. # [b, 64, h, w] => [b, 1024, h, w]
  48. x = self.blk1(x)
  49. x = self.blk2(x)
  50. x = self.blk3(x)
  51. x = self.blk4(x)
  52. # print('after conv:',x.shape) #[b,512,2,2]
  53. #[b,512,h,w] => [b,512,1,1]
  54. x = F.adaptive_avg_pool2d(x, [1,1])
  55. # print('after pool:', x.shape)
  56. x = x.view(x.size(0), -1)
  57. x = self.outlayer(x)
  58. return x
  59. def main():
  60. blk = ResBlk(64, 128, stride=2)
  61. tmp = torch.rand(2,64,32,32)
  62. out = blk(tmp)
  63. print('block:', out.shape)
  64. x = torch.rand(2,3,32,32)
  65. model = ResNet18()
  66. out = model(x)
  67. print('resnet:', out.shape)
  68. if __name__ == '__main__':
  69. main()

main.py

  1. import torch
  2. from torch.utils.data import DataLoader
  3. from torchvision import datasets
  4. from torchvision import transforms
  5. from torch import nn,optim
  6. #from lenet5 import Lenet5
  7. from resnet import ResNet18
  8. def main():
  9. batchsz = 32
  10. cifar_train = datasets.CIFAR10('dataset/', train=True, transform=transforms.Compose([
  11. transforms.Resize((32, 32)),
  12. transforms.ToTensor(),
  13. transforms.Normalize(mean=[0.485,0.456,0.406],
  14. std=[0.229,0.224,0.225])
  15. ]), download=True)
  16. cifar_train = DataLoader(cifar_train, batch_size=batchsz, shuffle=True)
  17. cifar_test = datasets.CIFAR10('dataset/', train=False, transform=transforms.Compose([
  18. transforms.Resize((32, 32)),
  19. transforms.ToTensor(),
  20. transforms.Normalize(mean=[0.485,0.456,0.406],
  21. std=[0.229,0.224,0.225])
  22. ]), download=True)
  23. cifar_test = DataLoader(cifar_test, batch_size=batchsz, shuffle=True)
  24. x, label = iter(cifar_train).next()
  25. print('x:', x.shape, 'label:', label.shape)
  26. device = torch.device('cuda')
  27. #model = Lenet5().to(device)
  28. model = ResNet18().to(device)
  29. criton = nn.CrossEntropyLoss().to(device) #包含了softmax
  30. optimizer = optim.Adam(model.parameters(),lr=1e-3)
  31. print(model)
  32. for epoch in range(1000):
  33. model.train()
  34. for batchidx, (x,label) in enumerate(cifar_train):
  35. #x:[b,3,32,32]
  36. #label: [b]
  37. x,label = x.to(device), label.to(device)
  38. logits=model(x)
  39. #logits:[b,10]
  40. #label:b[b]
  41. #loss: tensor scalar 长度为0的标量
  42. loss = criton(logits,label)
  43. #反向传播
  44. optimizer.zero_grad() #梯度清零
  45. loss.backward() #得到新的梯度
  46. optimizer.step() #走一步,把梯度更新到weight里面去了
  47. print(epoch, ' loss: ',loss.item())
  48. model.eval()
  49. #test
  50. total_correct = 0
  51. total_num = 0
  52. for x, label in cifar_test:
  53. # x:[b,3,32,32]
  54. # label: [b]
  55. x,label = x.to(device) ,label.to(device)
  56. #logits:[b,10]
  57. logits = model(x)
  58. pred = logits.argmax(dim=1)
  59. total_correct += torch.eq(pred,label).float().sum().item()
  60. total_num += x.size(0) #即batch_size
  61. acc = total_correct / total_num
  62. print(epoch, ' acc: ',acc)
  63. if __name__ == '__main__':
  64. main()

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/520634
推荐阅读
相关标签
  

闽ICP备14008679号