当前位置:   article > 正文

YOLOv9改进策略 | 细节创新篇 | 迭代注意力特征融合AFF机制创新RepNCSPELAN4_注意力aff

注意力aff

一、本文介绍

本文给大家带来的改进机制是AFF(迭代注意力特征融合),其主要思想是通过改善特征融合过程来提高检测精度。传统的特征融合方法如加法或串联简单,未考虑到特定对象的融合适用性。iAFF通过引入多尺度通道注意力模块(我个人觉得这个改进机制就算融合了注意力机制的求和操作)更好地整合不同尺度和语义不一致的特征该方法属于细节上的改进并不影响任何其它的模块,非常适合大家进行融合改进,单独使用也是有一定的涨点效果。欢迎大家订阅本专栏,本专栏每周更新3-5篇最新机制,更有包含我所有改进的文件和交流群提供给大家。

推荐指数:⭐⭐⭐⭐

涨点效果:⭐⭐⭐⭐

 专栏地址:YOLOv9有效涨点专栏-持续复现各种顶会内容-有效涨点-全网改进最全的专栏 

目录

一、本文介绍

二、iAFF的基本框架原理

三、iAFF的核心代码

四、手把手教你添加iAFF

4.1 iAFF添加步骤

4.1.1 修改一

4.1.2 修改二

4.1.3 修改三 

4.1.4 修改四

4.2 iAFF的yaml文件

4.3 iAFF的训练过程截图 

五、本文总结



二、AFF的基本框架原理

​​

官方论文地址: 官方论文地址点击即可跳转

官方代码地址: 官方代码地址点击即可跳转

​​


iAFF的主要思想在于通过更精细的注意力机制来改善特征融合,从而增强卷积神经网络。它不仅处理了由于尺度和语义不一致而引起的特征融合问题,还引入了多尺度通道注意力模块,提供了一种统一且通用的特征融合方案。此外,iAFF通过迭代注意力特征融合来解决特征图初始整合可能成为的瓶颈。这种方法使得模型即使在层数或参数较少的情况下,也能取得到较好的效果。 

iAFF的创新点主要包括:

1. 注意力特征融合:提出了一种新的特征融合方式,利用注意力机制来改善传统的简单特征融合方法(如加和或串联)。

2. 多尺度通道注意力模块:解决了在不同尺度上融合特征时出现的问题,特别是语义和尺度不一致的特征融合问题。

3. 迭代注意力特征融合(iAFF):通过迭代地应用注意力机制来改善特征图的初步整合,克服了初步整合可能成为性能瓶颈的问题。

​ 

这张图片是关于所提出的AFF(注意力特征融合)和iAFF(迭代注意力特征融合)的示意图。图中展示了两种结构:

(a) AFF: 展示了一个通过多尺度通道注意力模块(MS-CAM)来融合不同特征的基本框架。特征图X和Y通过MS-CAM和其他操作融合,产生输出Z。

(b) iAFF: 与AFF类似,但添加了迭代结构。在这里,输出Z回馈到输入,与X和Y一起再次经过MS-CAM和融合操作,以进一步细化特征融合过程。

(这两种方法都是文章中提出的我仅使用了iAFF也就是更复杂的版本,大家对于AFF有兴趣的可以按照我的该法进行相似添加即可)


三、AFF的核心代码

该代码的使用方式需要两个图片,有人去用其替换Concat操作,但是它的两个输入必须是相同shape,但是YOLOv9中我们Concat一般两个输入在图像宽高上都不一样,所以我用其替换Bottlenekc中的残差相加操作,算是一种比较细节上的创新。

  1. import torch
  2. import torch.nn as nn
  3. import numpy as np
  4. __all__ = ['RepNCSPELAN4_AFF']
  5. class AFF(nn.Module):
  6. '''
  7. 多特征融合 AFF
  8. '''
  9. def __init__(self, channels=64, r=2):
  10. super(AFF, self).__init__()
  11. inter_channels = int(channels // r)
  12. self.local_att = nn.Sequential(
  13. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  14. nn.BatchNorm2d(inter_channels),
  15. nn.ReLU(inplace=True),
  16. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  17. nn.BatchNorm2d(channels),
  18. )
  19. self.global_att = nn.Sequential(
  20. nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
  21. nn.BatchNorm2d(inter_channels),
  22. nn.ReLU(inplace=True),
  23. nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
  24. nn.BatchNorm2d(channels),
  25. )
  26. self.sigmoid = nn.Sigmoid()
  27. def forward(self, x, residual):
  28. xa = x + residual
  29. xl = self.local_att(xa)
  30. xg = self.global_att(xa)
  31. xlg = xl + xg
  32. wei = self.sigmoid(xlg)
  33. xo = 2 * x * wei + 2 * residual * (1 - wei)
  34. return xo
  35. class RepConvN(nn.Module):
  36. """RepConv is a basic rep-style block, including training and deploy status
  37. This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
  38. """
  39. default_act = nn.SiLU() # default activation
  40. def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):
  41. super().__init__()
  42. assert k == 3 and p == 1
  43. self.g = g
  44. self.c1 = c1
  45. self.c2 = c2
  46. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  47. self.bn = None
  48. self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)
  49. self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)
  50. def forward_fuse(self, x):
  51. """Forward process"""
  52. return self.act(self.conv(x))
  53. def forward(self, x):
  54. """Forward process"""
  55. id_out = 0 if self.bn is None else self.bn(x)
  56. return self.act(self.conv1(x) + self.conv2(x) + id_out)
  57. def get_equivalent_kernel_bias(self):
  58. kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)
  59. kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)
  60. kernelid, biasid = self._fuse_bn_tensor(self.bn)
  61. return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
  62. def _avg_to_3x3_tensor(self, avgp):
  63. channels = self.c1
  64. groups = self.g
  65. kernel_size = avgp.kernel_size
  66. input_dim = channels // groups
  67. k = torch.zeros((channels, input_dim, kernel_size, kernel_size))
  68. k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2
  69. return k
  70. def _pad_1x1_to_3x3_tensor(self, kernel1x1):
  71. if kernel1x1 is None:
  72. return 0
  73. else:
  74. return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
  75. def _fuse_bn_tensor(self, branch):
  76. if branch is None:
  77. return 0, 0
  78. if isinstance(branch, Conv):
  79. kernel = branch.conv.weight
  80. running_mean = branch.bn.running_mean
  81. running_var = branch.bn.running_var
  82. gamma = branch.bn.weight
  83. beta = branch.bn.bias
  84. eps = branch.bn.eps
  85. elif isinstance(branch, nn.BatchNorm2d):
  86. if not hasattr(self, 'id_tensor'):
  87. input_dim = self.c1 // self.g
  88. kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)
  89. for i in range(self.c1):
  90. kernel_value[i, i % input_dim, 1, 1] = 1
  91. self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
  92. kernel = self.id_tensor
  93. running_mean = branch.running_mean
  94. running_var = branch.running_var
  95. gamma = branch.weight
  96. beta = branch.bias
  97. eps = branch.eps
  98. std = (running_var + eps).sqrt()
  99. t = (gamma / std).reshape(-1, 1, 1, 1)
  100. return kernel * t, beta - running_mean * gamma / std
  101. def fuse_convs(self):
  102. if hasattr(self, 'conv'):
  103. return
  104. kernel, bias = self.get_equivalent_kernel_bias()
  105. self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,
  106. out_channels=self.conv1.conv.out_channels,
  107. kernel_size=self.conv1.conv.kernel_size,
  108. stride=self.conv1.conv.stride,
  109. padding=self.conv1.conv.padding,
  110. dilation=self.conv1.conv.dilation,
  111. groups=self.conv1.conv.groups,
  112. bias=True).requires_grad_(False)
  113. self.conv.weight.data = kernel
  114. self.conv.bias.data = bias
  115. for para in self.parameters():
  116. para.detach_()
  117. self.__delattr__('conv1')
  118. self.__delattr__('conv2')
  119. if hasattr(self, 'nm'):
  120. self.__delattr__('nm')
  121. if hasattr(self, 'bn'):
  122. self.__delattr__('bn')
  123. if hasattr(self, 'id_tensor'):
  124. self.__delattr__('id_tensor')
  125. def autopad(k, p=None, d=1): # kernel, padding, dilation
  126. # Pad to 'same' shape outputs
  127. if d > 1:
  128. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  129. if p is None:
  130. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  131. return p
  132. class Conv(nn.Module):
  133. # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
  134. default_act = nn.SiLU() # default activation
  135. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  136. super().__init__()
  137. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  138. self.bn = nn.BatchNorm2d(c2)
  139. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  140. def forward(self, x):
  141. return self.act(self.bn(self.conv(x)))
  142. def forward_fuse(self, x):
  143. return self.act(self.conv(x))
  144. class RepNBottleneck_AFF(nn.Module):
  145. # Standard bottleneck
  146. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, kernels, groups, expand
  147. super().__init__()
  148. c_ = int(c2 * e) # hidden channels
  149. self.cv1 = RepConvN(c1, c_, k[0], 1)
  150. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  151. self.add = shortcut and c1 == c2
  152. self.AFF = AFF(c2)
  153. def forward(self, x):
  154. if self.add:
  155. results = self.AFF(x, self.cv2(self.cv1(x)))
  156. else:
  157. results = self.cv2(self.cv1(x))
  158. return results
  159. class RepNCSP(nn.Module):
  160. # CSP Bottleneck with 3 convolutions
  161. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
  162. super().__init__()
  163. c_ = int(c2 * e) # hidden channels
  164. self.cv1 = Conv(c1, c_, 1, 1)
  165. self.cv2 = Conv(c1, c_, 1, 1)
  166. self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
  167. self.m = nn.Sequential(*(RepNBottleneck_AFF(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
  168. def forward(self, x):
  169. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
  170. class RepNCSPELAN4_AFF(nn.Module):
  171. # csp-elan
  172. def __init__(self, c1, c2, c3, c4, c5=1): # ch_in, ch_out, number, shortcut, groups, expansion
  173. super().__init__()
  174. self.c = c3//2
  175. self.cv1 = Conv(c1, c3, 1, 1)
  176. self.cv2 = nn.Sequential(RepNCSP(c3//2, c4, c5), Conv(c4, c4, 3, 1))
  177. self.cv3 = nn.Sequential(RepNCSP(c4, c4, c5), Conv(c4, c4, 3, 1))
  178. self.cv4 = Conv(c3+(2*c4), c2, 1, 1)
  179. def forward(self, x):
  180. y = list(self.cv1(x).chunk(2, 1))
  181. y.extend((m(y[-1])) for m in [self.cv2, self.cv3])
  182. return self.cv4(torch.cat(y, 1))
  183. def forward_split(self, x):
  184. y = list(self.cv1(x).split((self.c, self.c), 1))
  185. y.extend(m(y[-1]) for m in [self.cv2, self.cv3])
  186. return self.cv4(torch.cat(y, 1))
  187. if __name__ == '__main__':
  188. x1 = torch.randn(1, 32, 16, 16)
  189. x2 = torch.randn(1, 32, 16, 16)
  190. model = AFF(32)
  191. x = model(x1, x2)
  192. print(x.shape)


四、手把手教你添加AFF

4.1 AFF添加步骤

4.1.1 修改一

首先我们找到如下的目录'yolov9-main/models',然后在这个目录下在创建一个新的目录然后这个就是存储改进的仓库,大家可以在这里新建所有的改进的py文件,对应改进的文件名字可以根据你自己的习惯起(不影响任何但是下面导入的时候记住改成你对应的即可),然后将AFF的核心代码复制进去。


4.1.2 修改二

然后在新建的目录里面我们在新建一个__init__.py文件(此文件大家只需要建立一个即可),然后我们在里面添加导入我们模块的代码。注意标记一个'.'其作用是标记当前目录。


4.1.3 修改三 

然后我们找到如下文件''models/yolo.py''在开头的地方导入我们的模块按照如下修改->

(如果你看了我多个改进机制此处只需要添加一个即可,无需重复添加)

注意的添加位置要放在common的导入上面!!!!!

​​​​​


4.1.4 修改四

然后我们找到''models/yolo.py''文件中的parse_model方法,按照如下修改->

到此就修改完成了,复制下面的ymal文件即可运行。


4.2 AFF的yaml文件

因为其只能在进行残差的时候求和,所以我们只需要替换主干上的C3即可。

  1. # YOLOv9
  2. # parameters
  3. nc: 80 # number of classes
  4. depth_multiple: 1 # model depth multiple
  5. width_multiple: 1 # layer channel multiple
  6. #activation: nn.LeakyReLU(0.1)
  7. #activation: nn.ReLU()
  8. # anchors
  9. anchors: 3
  10. # YOLOv9 backbone
  11. backbone:
  12. [
  13. [-1, 1, Silence, []],
  14. # conv down
  15. [-1, 1, Conv, [64, 3, 2]], # 1-P1/2
  16. # conv down
  17. [-1, 1, Conv, [128, 3, 2]], # 2-P2/4
  18. # elan-1 block
  19. [-1, 1, RepNCSPELAN4_AFF, [256, 128, 64, 1]], # 3
  20. # conv down
  21. [-1, 1, Conv, [256, 3, 2]], # 4-P3/8
  22. # elan-2 block
  23. [-1, 1, RepNCSPELAN4_AFF, [512, 256, 128, 1]], # 5
  24. # conv down
  25. [-1, 1, Conv, [512, 3, 2]], # 6-P4/16
  26. # elan-2 block
  27. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 7
  28. # conv down
  29. [-1, 1, Conv, [512, 3, 2]], # 8-P5/32
  30. # elan-2 block
  31. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 9
  32. ]
  33. # YOLOv9 head
  34. head:
  35. [
  36. # elan-spp block
  37. [-1, 1, SPPELAN, [512, 256]], # 10
  38. # up-concat merge
  39. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  40. [[-1, 7], 1, Concat, [1]], # cat backbone P4
  41. # elan-2 block
  42. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 13
  43. # up-concat merge
  44. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  45. [[-1, 5], 1, Concat, [1]], # cat backbone P3
  46. # elan-2 block
  47. [-1, 1, RepNCSPELAN4_AFF, [256, 256, 128, 1]], # 16 (P3/8-small)
  48. # conv-down merge
  49. [-1, 1, Conv, [256, 3, 2]],
  50. [[-1, 13], 1, Concat, [1]], # cat head P4
  51. # elan-2 block
  52. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 19 (P4/16-medium)
  53. # conv-down merge
  54. [-1, 1, Conv, [512, 3, 2]],
  55. [[-1, 10], 1, Concat, [1]], # cat head P5
  56. # elan-2 block
  57. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 22 (P5/32-large)
  58. # routing
  59. [5, 1, CBLinear, [[256]]], # 23
  60. [7, 1, CBLinear, [[256, 512]]], # 24
  61. [9, 1, CBLinear, [[256, 512, 512]]], # 25
  62. # conv down
  63. [0, 1, Conv, [64, 3, 2]], # 26-P1/2
  64. # conv down
  65. [-1, 1, Conv, [128, 3, 2]], # 27-P2/4
  66. # elan-1 block
  67. [-1, 1, RepNCSPELAN4_AFF, [256, 128, 64, 1]], # 28
  68. # conv down fuse
  69. [-1, 1, Conv, [256, 3, 2]], # 29-P3/8
  70. [[23, 24, 25, -1], 1, CBFuse, [[0, 0, 0]]], # 30
  71. # elan-2 block
  72. [-1, 1, RepNCSPELAN4_AFF, [512, 256, 128, 1]], # 31
  73. # conv down fuse
  74. [-1, 1, Conv, [512, 3, 2]], # 32-P4/16
  75. [[24, 25, -1], 1, CBFuse, [[1, 1]]], # 33
  76. # elan-2 block
  77. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 34
  78. # conv down fuse
  79. [-1, 1, Conv, [512, 3, 2]], # 35-P5/32
  80. [[25, -1], 1, CBFuse, [[2]]], # 36
  81. # elan-2 block
  82. [-1, 1, RepNCSPELAN4_AFF, [512, 512, 256, 1]], # 37
  83. # detect
  84. [[31, 34, 37, 16, 19, 22], 1, DualDDetect, [nc]], # DualDDetect(A3, A4, A5, P3, P4, P5)
  85. ]

4.3 iAFF的训练过程截图 

大家可以看下面的运行结果和添加的位置所以不存在我发的代码不全或者运行不了的问题大家有问题也可以在评论区评论我看到都会为大家解答(我知道的)。

​​

​​​​​​


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv9改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

希望大家阅读完以后可以给文章点点赞和评论支持一下这样购买专栏的人越多群内人越多大家交流的机会就更多了。  

 专栏地址:YOLOv9有效涨点专栏-持续复现各种顶会内容-有效涨点-全网改进最全的专栏 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/480166
推荐阅读
相关标签
  

闽ICP备14008679号