当前位置:   article > 正文

ECANet注意力机制学习 (附代码)_eca注意力机制

eca注意力机制

论文地址:ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

1.是什么?

ECA注意力模块是在CVPR 2020的论文"ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks"中提出的。

ECA(Efficient Channel Attention)是一种轻量级的通道注意力机制,它通过一个1D卷积层来学习通道注意力,并减少计算复杂度。ECA注意力机制避免了降维,而是利用1维卷积实现了局部跨通道交互,从而提取通道间的依赖关系。

2.为什么?

神经网络在提取特征时,所获取的特征图中并不是每一个特征层的贡献都一样,不同的特征层对于结果的作用权重是不相同。

ECA注意力机制将通道注意力机制引入卷积神经网络,通过对每个通道的特征图进行全局自适应加权,提升了特征的表达能力。

3.怎么样?

3.2 避免降维

实证分析表明降维会对渠道关注度的预测产生副作用,而且对所有渠道的相关性进行捕获是低效且不必要的。SE块使用两个FC层计算权重。与之不同的是,ECA通过执行大小为k的快速一维卷积来生成通道权值,其中k通过通道维C的函数自适应地确定。

3.2 跨通道交互

虽然SEVar2和SE-Var3都保持通道维数不变,但后者的性能更好。主要的区别是SE-Var3捕获跨通道交互,而SEVar2不捕获。这说明跨通道互动有助于学习有效注意。但是SE-Var3涉及大量的参数,导致模型复杂度过高。从有效卷积的角度来看,SE-Var2可视为深度可分离卷积(Chollet 2017)。自然,组卷积作为另一种有效的卷积,也可以用来捕获跨通道交互。给定一个FC层,组卷积将它分成多个组,并在每个组中独立地执行线性变换。

公式推理过程:

对于不降维的聚合特征 y ∈ RC,可以学习通道注意 :

W 为 C x C 的参数矩阵 ;

Wvar2 是一个对角矩阵,包含C个参数 ;
Wvar3 是一个完整的矩阵,包含 C×C 的参数 ;
关键的区别在于:SE-var3考虑了跨通道交互,而SE-var2没有考虑,因此SE-V ar3的性能更好 ;

在 ECA-Net 中,探索了另一种获取 局部跨通道交互 的方法,以保证效率和有效性,使用一个 波段矩阵Wk 来学习通道注意力:


其中,C1D 表示一维卷积 ;

3.3自适应卷积核

由于使用1D卷积来捕获局部的跨通道交互,k决定了交互的覆盖范围,不同的通道数和不同的CNN架构的卷积块可能会有所不同。尽管k可以手动调优,但它将消耗大量计算资源。k与通道维数c有关,这是合理的。一般认为,通道尺寸越大,长期交互作用越强,而通道尺寸越小,短期交互作用越强。 

3.4 开发方向

大致可以分为两个方向:

(1)增强特征聚合;
(2)通道与空间注意的结合 ;

4.代码 

4.1eca_module.py

  1. import torch
  2. from torch import nn
  3. from torch.nn.parameter import Parameter
  4. class eca_layer(nn.Module):
  5. """Constructs a ECA module.
  6. Args:
  7. channel: Number of channels of the input feature map
  8. k_size: Adaptive selection of kernel size
  9. """
  10. def __init__(self, channel, k_size=3):
  11. super(eca_layer, self).__init__()
  12. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  13. self.conv = nn.Conv1d(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias=False)
  14. self.sigmoid = nn.Sigmoid()
  15. def forward(self, x):
  16. # feature descriptor on the global spatial information
  17. y = self.avg_pool(x)
  18. # Two different branches of ECA module
  19. y = self.conv(y.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
  20. # Multi-scale information fusion
  21. y = self.sigmoid(y)
  22. return x * y.expand_as(x)

4.2 eca_resnet.py

  1. import torch.nn as nn
  2. import math
  3. # import torch.utils.model_zoo as model_zoo
  4. from eca_module import eca_layer
  5. def conv3x3(in_planes, out_planes, stride=1):
  6. """3x3 convolution with padding"""
  7. return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
  8. padding=1, bias=False)
  9. class ECABasicBlock(nn.Module):
  10. expansion = 1
  11. def __init__(self, inplanes, planes, stride=1, downsample=None, k_size=3):
  12. super(ECABasicBlock, self).__init__()
  13. self.conv1 = conv3x3(inplanes, planes, stride)
  14. self.bn1 = nn.BatchNorm2d(planes)
  15. self.relu = nn.ReLU(inplace=True)
  16. self.conv2 = conv3x3(planes, planes, 1)
  17. self.bn2 = nn.BatchNorm2d(planes)
  18. self.eca = eca_layer(planes, k_size)
  19. self.downsample = downsample
  20. self.stride = stride
  21. def forward(self, x):
  22. residual = x
  23. out = self.conv1(x)
  24. out = self.bn1(out)
  25. out = self.relu(out)
  26. out = self.conv2(out)
  27. out = self.bn2(out)
  28. out = self.eca(out)
  29. if self.downsample is not None:
  30. residual = self.downsample(x)
  31. out += residual
  32. out = self.relu(out)
  33. return out
  34. class ECABottleneck(nn.Module):
  35. expansion = 4
  36. def __init__(self, inplanes, planes, stride=1, downsample=None, k_size=3):
  37. super(ECABottleneck, self).__init__()
  38. self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
  39. self.bn1 = nn.BatchNorm2d(planes)
  40. self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
  41. padding=1, bias=False)
  42. self.bn2 = nn.BatchNorm2d(planes)
  43. self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
  44. self.bn3 = nn.BatchNorm2d(planes * 4)
  45. self.relu = nn.ReLU(inplace=True)
  46. self.eca = eca_layer(planes * 4, k_size)
  47. self.downsample = downsample
  48. self.stride = stride
  49. def forward(self, x):
  50. residual = x
  51. out = self.conv1(x)
  52. out = self.bn1(out)
  53. out = self.relu(out)
  54. out = self.conv2(out)
  55. out = self.bn2(out)
  56. out = self.relu(out)
  57. out = self.conv3(out)
  58. out = self.bn3(out)
  59. out = self.eca(out)
  60. if self.downsample is not None:
  61. residual = self.downsample(x)
  62. out += residual
  63. out = self.relu(out)
  64. return out
  65. class ResNet(nn.Module):
  66. def __init__(self, block, layers, num_classes=1000, k_size=[3, 3, 3, 3]):
  67. self.inplanes = 64
  68. super(ResNet, self).__init__()
  69. self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
  70. bias=False)
  71. self.bn1 = nn.BatchNorm2d(64)
  72. self.relu = nn.ReLU(inplace=True)
  73. self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
  74. self.layer1 = self._make_layer(block, 64, layers[0], int(k_size[0]))
  75. self.layer2 = self._make_layer(block, 128, layers[1], int(k_size[1]), stride=2)
  76. self.layer3 = self._make_layer(block, 256, layers[2], int(k_size[2]), stride=2)
  77. self.layer4 = self._make_layer(block, 512, layers[3], int(k_size[3]), stride=2)
  78. self.avgpool = nn.AvgPool2d(7, stride=1)
  79. self.fc = nn.Linear(512 * block.expansion, num_classes)
  80. for m in self.modules():
  81. if isinstance(m, nn.Conv2d):
  82. n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
  83. m.weight.data.normal_(0, math.sqrt(2. / n))
  84. elif isinstance(m, nn.BatchNorm2d):
  85. m.weight.data.fill_(1)
  86. m.bias.data.zero_()
  87. def _make_layer(self, block, planes, blocks, k_size, stride=1):
  88. downsample = None
  89. if stride != 1 or self.inplanes != planes * block.expansion:
  90. downsample = nn.Sequential(
  91. nn.Conv2d(self.inplanes, planes * block.expansion,
  92. kernel_size=1, stride=stride, bias=False),
  93. nn.BatchNorm2d(planes * block.expansion),
  94. )
  95. layers = []
  96. layers.append(block(self.inplanes, planes, stride, downsample, k_size))
  97. self.inplanes = planes * block.expansion
  98. for i in range(1, blocks):
  99. layers.append(block(self.inplanes, planes, k_size=k_size))
  100. return nn.Sequential(*layers)
  101. def forward(self, x):
  102. x = self.conv1(x)
  103. x = self.bn1(x)
  104. x = self.relu(x)
  105. x = self.maxpool(x)
  106. x = self.layer1(x)
  107. x = self.layer2(x)
  108. x = self.layer3(x)
  109. x = self.layer4(x)
  110. x = self.avgpool(x)
  111. x = x.view(x.size(0), -1)
  112. x = self.fc(x)
  113. return x
  114. def eca_resnet18(k_size=[3, 3, 3, 3], num_classes=1_000, pretrained=False):
  115. """Constructs a ResNet-18 model.
  116. Args:
  117. k_size: Adaptive selection of kernel size
  118. pretrained (bool): If True, returns a model pre-trained on ImageNet
  119. num_classes:The classes of classification
  120. """
  121. model = ResNet(ECABasicBlock, [2, 2, 2, 2], num_classes=num_classes, k_size=k_size)
  122. model.avgpool = nn.AdaptiveAvgPool2d(1)
  123. return model
  124. def eca_resnet34(k_size=[3, 3, 3, 3], num_classes=1_000, pretrained=False):
  125. """Constructs a ResNet-34 model.
  126. Args:
  127. k_size: Adaptive selection of kernel size
  128. pretrained (bool): If True, returns a model pre-trained on ImageNet
  129. num_classes:The classes of classification
  130. """
  131. model = ResNet(ECABasicBlock, [3, 4, 6, 3], num_classes=num_classes, k_size=k_size)
  132. model.avgpool = nn.AdaptiveAvgPool2d(1)
  133. return model
  134. def eca_resnet50(k_size=[3, 3, 3, 3], num_classes=1000, pretrained=False):
  135. """Constructs a ResNet-50 model.
  136. Args:
  137. k_size: Adaptive selection of kernel size
  138. num_classes:The classes of classification
  139. pretrained (bool): If True, returns a model pre-trained on ImageNet
  140. """
  141. print("Constructing eca_resnet50......")
  142. model = ResNet(ECABottleneck, [3, 4, 6, 3], num_classes=num_classes, k_size=k_size)
  143. model.avgpool = nn.AdaptiveAvgPool2d(1)
  144. return model
  145. def eca_resnet101(k_size=[3, 3, 3, 3], num_classes=1_000, pretrained=False):
  146. """Constructs a ResNet-101 model.
  147. Args:
  148. k_size: Adaptive selection of kernel size
  149. num_classes:The classes of classification
  150. pretrained (bool): If True, returns a model pre-trained on ImageNet
  151. """
  152. model = ResNet(ECABottleneck, [3, 4, 23, 3], num_classes=num_classes, k_size=k_size)
  153. model.avgpool = nn.AdaptiveAvgPool2d(1)
  154. return model
  155. def eca_resnet152(k_size=[3, 3, 3, 3], num_classes=1_000, pretrained=False):
  156. """Constructs a ResNet-152 model.
  157. Args:
  158. k_size: Adaptive selection of kernel size
  159. num_classes:The classes of classification
  160. pretrained (bool): If True, returns a model pre-trained on ImageNet
  161. """
  162. model = ResNet(ECABottleneck, [3, 8, 36, 3], num_classes=num_classes, k_size=k_size)
  163. model.avgpool = nn.AdaptiveAvgPool2d(1)
  164. return model

参考:

【pytorch】ECA-NET注意力机制应用于ResNet的代码实现

[ 注意力机制 ] 经典网络模型3——ECANet 详解与复现

论文翻译:ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/307146
推荐阅读
相关标签
  

闽ICP备14008679号