当前位置:   article > 正文

YOLOv9有效提点|加入BAM、CloFormer、Reversible Column Networks、Lskblock等几十种注意力机制(二)_在yolov9添加空间注意力机制

在yolov9添加空间注意力机制

 


专栏介绍:YOLOv9改进系列 | 包含深度学习最新创新,主力高效涨点!!!


一、本文介绍

 本文只有代码注意力模块简介,YOLOv9中的添加教程:可以看这篇文章。

YOLOv9有效提点|加入SE、CBAM、ECA、SimAM等几十种注意力机制(一)


CloFormer:《Rethinking Local Perception in Lightweight Vision Transformer》

        CloFormer是一种轻量级视觉转换器,它通过利用上下文感知的本地增强,提高了在图像分类、目标检测和语义分割等任务上的性能。CloFormer通过引入一种名为AttnConv的卷积操作,结合共享权重和上下文感知权重,有效地捕捉了高频率的本地信息。实验结果表明,CloFormer在各种视觉任务中具有显著优势。

  1. class AttnMap(nn.Module):
  2. def __init__(self, dim):
  3. super().__init__()
  4. self.act_block = nn.Sequential(
  5. nn.Conv2d(dim, dim, 1, 1, 0),
  6. MemoryEfficientSwish(),
  7. nn.Conv2d(dim, dim, 1, 1, 0)
  8. #nn.Identity()
  9. )
  10. def forward(self, x):
  11. return self.act_block(x)
  12. class EfficientAttention(nn.Module):
  13. def __init__(self, dim, num_heads, group_split: List[int], kernel_sizes: List[int], window_size=7,
  14. attn_drop=0., proj_drop=0., qkv_bias=True):
  15. super().__init__()
  16. assert sum(group_split) == num_heads
  17. assert len(kernel_sizes) + 1 == len(group_split)
  18. self.dim = dim
  19. self.num_heads = num_heads
  20. self.dim_head = dim // num_heads
  21. self.scalor = self.dim_head ** -0.5
  22. self.kernel_sizes = kernel_sizes
  23. self.window_size = window_size
  24. self.group_split = group_split
  25. convs = []
  26. act_blocks = []
  27. qkvs = []
  28. #projs = []
  29. for i in range(len(kernel_sizes)):
  30. kernel_size = kernel_sizes[i]
  31. group_head = group_split[i]
  32. if group_head == 0:
  33. continue
  34. convs.append(nn.Conv2d(3*self.dim_head*group_head, 3*self.dim_head*group_head, kernel_size,
  35. 1, kernel_size//2, groups=3*self.dim_head*group_head))
  36. act_blocks.append(AttnMap(self.dim_head*group_head))
  37. qkvs.append(nn.Conv2d(dim, 3*group_head*self.dim_head, 1, 1, 0, bias=qkv_bias))
  38. #projs.append(nn.Linear(group_head*self.dim_head, group_head*self.dim_head, bias=qkv_bias))
  39. if group_split[-1] != 0:
  40. self.global_q = nn.Conv2d(dim, group_split[-1]*self.dim_head, 1, 1, 0, bias=qkv_bias)
  41. self.global_kv = nn.Conv2d(dim, group_split[-1]*self.dim_head*2, 1, 1, 0, bias=qkv_bias)
  42. #self.global_proj = nn.Linear(group_split[-1]*self.dim_head, group_split[-1]*self.dim_head, bias=qkv_bias)
  43. self.avgpool = nn.AvgPool2d(window_size, window_size) if window_size!=1 else nn.Identity()
  44. self.convs = nn.ModuleList(convs)
  45. self.act_blocks = nn.ModuleList(act_blocks)
  46. self.qkvs = nn.ModuleList(qkvs)
  47. self.proj = nn.Conv2d(dim, dim, 1, 1, 0, bias=qkv_bias)
  48. self.attn_drop = nn.Dropout(attn_drop)
  49. self.proj_drop = nn.Dropout(proj_drop)
  50. def high_fre_attntion(self, x: torch.Tensor, to_qkv: nn.Module, mixer: nn.Module, attn_block: nn.Module):
  51. '''
  52. x: (b c h w)
  53. '''
  54. b, c, h, w = x.size()
  55. qkv = to_qkv(x) #(b (3 m d) h w)
  56. qkv = mixer(qkv).reshape(b, 3, -1, h, w).transpose(0, 1).contiguous() #(3 b (m d) h w)
  57. q, k, v = qkv #(b (m d) h w)
  58. attn = attn_block(q.mul(k)).mul(self.scalor)
  59. attn = self.attn_drop(torch.tanh(attn))
  60. res = attn.mul(v) #(b (m d) h w)
  61. return res
  62. def low_fre_attention(self, x : torch.Tensor, to_q: nn.Module, to_kv: nn.Module, avgpool: nn.Module):
  63. '''
  64. x: (b c h w)
  65. '''
  66. b, c, h, w = x.size()
  67. q = to_q(x).reshape(b, -1, self.dim_head, h*w).transpose(-1, -2).contiguous() #(b m (h w) d)
  68. kv = avgpool(x) #(b c h w)
  69. kv = to_kv(kv).view(b, 2, -1, self.dim_head, (h*w)//(self.window_size**2)).permute(1, 0, 2, 4, 3).contiguous() #(2 b m (H W) d)
  70. k, v = kv #(b m (H W) d)
  71. attn = self.scalor * q @ k.transpose(-1, -2) #(b m (h w) (H W))
  72. attn = self.attn_drop(attn.softmax(dim=-1))
  73. res = attn @ v #(b m (h w) d)
  74. res = res.transpose(2, 3).reshape(b, -1, h, w).contiguous()
  75. return res
  76. def forward(self, x: torch.Tensor):
  77. '''
  78. x: (b c h w)
  79. '''
  80. res = []
  81. for i in range(len(self.kernel_sizes)):
  82. if self.group_split[i] == 0:
  83. continue
  84. res.append(self.high_fre_attntion(x, self.qkvs[i], self.convs[i], self.act_blocks[i]))
  85. if self.group_split[-1] != 0:
  86. res.append(self.low_fre_attention(x, self.global_q, self.global_kv, self.avgpool))
  87. return self.proj_drop(self.proj(torch.cat(res, dim=1)))

《Reversible Column Networks》

        Reversible Column Networks一种新的神经网络设计范式——可逆列网络(RevCol)。RevCol主要由多个子网络(称为“列”)的副本组成,这些子网络之间使用了多级可逆连接。这种架构方案使得RevCol的行为与传统的网络非常不同:在正向传播过程中,当特征通过每个列时,它们被逐渐解开,同时保持总信息量,而不是像其他网络那样进行压缩或丢弃。

 这个暂时没调试,代码地址:RevCol/models/revcol.py at main · megvii-research/RevCol (github.com)icon-default.png?t=N7T8https://github.com/megvii-research/RevCol/blob/main/models/revcol.py


《BAM: Bottleneck Attention Module》

        瓶颈注意模块(BAM)关注深度神经网络中注意力机制的影响,提出了一个简单而有效的注意力模块,即瓶颈注意模块(BAM),可以与任何前馈卷积神经网络集成,沿着两个不同的路径(通道和空间)推断注意力映射。 将模块放在模型的每个瓶颈处(特征映射产生降采样),构建一个具有多个参数的分层注意,可以与任何前馈模型以端到端方式进行训练。

  1. def autopad(k, p=None, d=1): # kernel, padding, dilation
  2. """Pad to 'same' shape outputs."""
  3. if d > 1:
  4. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  5. if p is None:
  6. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  7. return p
  8. class Flatten(nn.Module):
  9. def forward(self, x):
  10. return x.view(x.shape[0], -1)
  11. class ChannelAttention(nn.Module):
  12. def __init__(self, channel, reduction=16, num_layers=3):
  13. super().__init__()
  14. self.avgpool = nn.AdaptiveAvgPool2d(1)
  15. gate_channels = [channel]
  16. gate_channels += [channel // reduction] * num_layers
  17. gate_channels += [channel]
  18. self.ca = nn.Sequential()
  19. self.ca.add_module('flatten', Flatten())
  20. for i in range(len(gate_channels) - 2):
  21. self.ca.add_module('fc%d' % i, nn.Linear(gate_channels[i], gate_channels[i + 1]))
  22. self.ca.add_module('bn%d' % i, nn.BatchNorm1d(gate_channels[i + 1]))
  23. self.ca.add_module('relu%d' % i, nn.ReLU())
  24. self.ca.add_module('last_fc', nn.Linear(gate_channels[-2], gate_channels[-1]))
  25. def forward(self, x):
  26. res = self.avgpool(x)
  27. res = self.ca(res)
  28. res = res.unsqueeze(-1).unsqueeze(-1).expand_as(x)
  29. return res
  30. class SpatialAttention(nn.Module):
  31. def __init__(self, channel, reduction=16, num_layers=3, dia_val=2):
  32. super().__init__()
  33. self.sa = nn.Sequential()
  34. self.sa.add_module('conv_reduce1',
  35. nn.Conv2d(kernel_size=1, in_channels=channel, out_channels=channel // reduction))
  36. self.sa.add_module('bn_reduce1', nn.BatchNorm2d(channel // reduction))
  37. self.sa.add_module('relu_reduce1', nn.ReLU())
  38. for i in range(num_layers):
  39. self.sa.add_module('conv_%d' % i, nn.Conv2d(kernel_size=3, in_channels=channel // reduction,
  40. out_channels=channel // reduction, padding=autopad(3, None, dia_val), dilation=dia_val))
  41. self.sa.add_module('bn_%d' % i, nn.BatchNorm2d(channel // reduction))
  42. self.sa.add_module('relu_%d' % i, nn.ReLU())
  43. self.sa.add_module('last_conv', nn.Conv2d(channel // reduction, 1, kernel_size=1))
  44. def forward(self, x):
  45. res = self.sa(x)
  46. res = res.expand_as(x)
  47. return res
  48. class BAMBlock(nn.Module):
  49. def __init__(self, channel=512, reduction=16, dia_val=2):
  50. super().__init__()
  51. self.ca = ChannelAttention(channel=channel, reduction=reduction)
  52. self.sa = SpatialAttention(channel=channel, reduction=reduction, dia_val=dia_val)
  53. self.sigmoid = nn.Sigmoid()
  54. def init_weights(self):
  55. for m in self.modules():
  56. if isinstance(m, nn.Conv2d):
  57. init.kaiming_normal_(m.weight, mode='fan_out')
  58. if m.bias is not None:
  59. init.constant_(m.bias, 0)
  60. elif isinstance(m, nn.BatchNorm2d):
  61. init.constant_(m.weight, 1)
  62. init.constant_(m.bias, 0)
  63. elif isinstance(m, nn.Linear):
  64. init.normal_(m.weight, std=0.001)
  65. if m.bias is not None:
  66. init.constant_(m.bias, 0)
  67. def forward(self, x):
  68. b, c, _, _ = x.size()
  69. sa_out = self.sa(x)
  70. ca_out = self.ca(x)
  71. weight = self.sigmoid(sa_out + ca_out)
  72. out = (1 + weight) * x
  73. return out

《Large Selective Kernel Network for Remote Sensing Object Detection》

         LSKnet是一种用于遥感目标检测的大规模选择性核网络,改论文提出了远程感应目标检测新方法LSKNet,这种网络可以动态地调整其大的空间感受野,以更好地模拟远程感应场景中不同物体的范围上下文。文章中提到,这是首次在远程感应目标检测领域探索大型和选择性核机制。

  1. class LSKblock(nn.Module):
  2. def __init__(self, dim):
  3. super().__init__()
  4. self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)
  5. self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3)
  6. self.conv1 = nn.Conv2d(dim, dim//2, 1)
  7. self.conv2 = nn.Conv2d(dim, dim//2, 1)
  8. self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3)
  9. self.conv = nn.Conv2d(dim//2, dim, 1)
  10. def forward(self, x):
  11. attn1 = self.conv0(x)
  12. attn2 = self.conv_spatial(attn1)
  13. attn1 = self.conv1(attn1)
  14. attn2 = self.conv2(attn2)
  15. attn = torch.cat([attn1, attn2], dim=1)
  16. avg_attn = torch.mean(attn, dim=1, keepdim=True)
  17. max_attn, _ = torch.max(attn, dim=1, keepdim=True)
  18. agg = torch.cat([avg_attn, max_attn], dim=1)
  19. sig = self.conv_squeeze(agg).sigmoid()
  20. attn = attn1 * sig[:,0,:,:].unsqueeze(1) + attn2 * sig[:,1,:,:].unsqueeze(1)
  21. attn = self.conv(attn)
  22. return x * attn

 

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号