当前位置:   article > 正文

手把手带你YOLOv5/v7 添加注意力机制_yolo 形状注意力

yolo 形状注意力

手把手带你YOLOv5/v7 添加注意力机制 - 知乎

注意力机制介绍

注意力机制(Attention Mechanism)源于对人类视觉的研究。在认知科学中,由于信息处理的瓶颈,人类会选择性地关注所有信息的一部分,同时忽略其他可见的信息。为了合理利用有限的视觉信息处理资源,人类需要选择视觉区域中的特定部分,然后集中关注它。例如,人们在阅读时,通常只有少量要被读取的词会被关注和处理。综上,注意力机制主要有两个方面:决定需要关注输入的哪部分;分配有限的信息处理资源给重要的部分。这几年有关attention的论文与日俱增,下图就显示了在包括CVPR、ICCV、ECCV、NeurIPS、ICML和ICLR在内的顶级会议中,与attention相关的论文数量的增加量。下面我将会分享Yolov5 v6.1如何添加注意力机制;并分享到2022年4月为止,30个顶会上提出的优秀的attention。

注意力相关的论文数量的增加量

可视化图表显示了顶级会议中与注意力相关的论文数量的增加量, 包括CVPR,ICCV,ECCV,NeurIPS,ICML和ICLR。


注意力机制的分类

注意力机制分类图


1. SE 注意力模块

论文名称:《Squeeze-and-Excitation Networks》

论文地址:https://arxiv.org/pdf/1709.01507.pdf

代码地址: https://github.com/hujie-frank/SENet

1.1 原理

SEnet(Squeeze-and-Excitation Network)考虑了特征通道之间的关系,在特征通道上加入了注意力机制。

SEnet通过学习的方式自动获取每个特征通道的重要程度,并且利用得到的重要程度来提升特征并抑制对当前任务不重要的特征。SEnet 通过Squeeze模块和Exciation模块实现所述功能。

SE

如图所示,首先作者通过squeeze操作,对空间维度进行压缩,直白的说就是对每个特征图做全局池化,平均成一个实数值。该实数从某种程度上来说具有全局感受野。作者提到该操作能够使得靠近数据输入的特征也可以具有全局感受野,这一点在很多的任务中是非常有用的。紧接着就是excitaton操作,由于经过squeeze操作后,网络输出了 $11C$ 大小的特征图,作者利用权重 $w$ 来学习 $C$ 个通道直接的相关性。在实际应用时有的框架使用全连接,有的框架使用 $1*1$ 的卷积实现。该过程中作者先对 $C$ 个通道降维再扩展回 $C$ 通道。好处就是一方面降低了网络计算量,一方面增加了网络的非线性能力。最后一个操作时将exciation的输出看作是经过特征选择后的每个通道的重要性,通过乘法加权的方式乘到先前的特征上,从事实现提升重要特征,抑制不重要特征这个功能。

1.2 代码

  1. # SE
  2. class SE(nn.Module):
  3. def __init__(self, c1, c2, ratio=16):
  4. super(SE, self).__init__()
  5. #c*1*1
  6. self.avgpool = nn.AdaptiveAvgPool2d(1)
  7. self.l1 = nn.Linear(c1, c1 // ratio, bias=False)
  8. self.relu = nn.ReLU(inplace=True)
  9. self.l2 = nn.Linear(c1 // ratio, c1, bias=False)
  10. self.sig = nn.Sigmoid()
  11. def forward(self, x):
  12. b, c, _, _ = x.size()
  13. y = self.avgpool(x).view(b, c)
  14. y = self.l1(y)
  15. y = self.relu(y)
  16. y = self.l2(y)
  17. y = self.sig(y)
  18. y = y.view(b, c, 1, 1)
  19. return x * y.expand_as(x)

这里放上我自己做实验的截图,我就是把SE层加到了第 $9$ 层的位置;粉红色线条代表添加了SE注意力机制。

实验结果


2. CBAM 注意力模块

论文题目:《CBAM: Convolutional Block Attention Module》

论文地址:https://arxiv.org/pdf/1807.06521.pdf

2.1 原理

CBAM(Convolutional Block Attention Module)结合了特征通道和特征空间两个维度的注意力机制。

CBAM

CBAM通过学习的方式自动获取每个特征通道的重要程度,和SEnet类似。此外还通过类似的学习方式自动获取每个特征空间的重要程度。并且利用得到的重要程度来提升特征并抑制对当前任务不重要的特征。

CAM

CBAM提取特征通道注意力的方式基本和SEnet类似,如下Channel Attention中的代码所示,其在SEnet的基础上增加了max_pool的特征提取方式,其余步骤是一样的。将通道注意力提取厚的特征作为空间注意力模块的输入。

SAM

CBAM提取特征空间注意力的方式:经过ChannelAttention后,最终将经过通道重要性选择后的特征图送入特征空间注意力模块,和通道注意力模块类似,空间注意力是以通道为单位进行最大池化和平均池化,并将两者的结果进行concat,之后再一个卷积降成 $1wh$ 的特征图空间权重,再将该权重和输入特征进行点积,从而实现空间注意力机制。

2.2 代码

  1. # CBAM
  2. class ChannelAttention(nn.Module):
  3. def __init__(self, in_planes, ratio=16):
  4. super(ChannelAttention, self).__init__()
  5. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  6. self.max_pool = nn.AdaptiveMaxPool2d(1)
  7. self.f1 = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False)
  8. self.relu = nn.ReLU()
  9. self.f2 = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False)
  10. self.sigmoid = nn.Sigmoid()
  11. def forward(self, x):
  12. avg_out = self.f2(self.relu(self.f1(self.avg_pool(x))))
  13. max_out = self.f2(self.relu(self.f1(self.max_pool(x))))
  14. out = self.sigmoid(avg_out + max_out)
  15. return out
  16. class SpatialAttention(nn.Module):
  17. def __init__(self, kernel_size=7):
  18. super(SpatialAttention, self).__init__()
  19. assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
  20. padding = 3 if kernel_size == 7 else 1
  21. # (特征图的大小-算子的size+2*padding)/步长+1
  22. self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
  23. self.sigmoid = nn.Sigmoid()
  24. def forward(self, x):
  25. # 1*h*w
  26. avg_out = torch.mean(x, dim=1, keepdim=True)
  27. max_out, _ = torch.max(x, dim=1, keepdim=True)
  28. x = torch.cat([avg_out, max_out], dim=1)
  29. #2*h*w
  30. x = self.conv(x)
  31. #1*h*w
  32. return self.sigmoid(x)
  33. class CBAM(nn.Module):
  34. def __init__(self, c1, c2, ratio=16, kernel_size=7): # ch_in, ch_out, number, shortcut, groups, expansion
  35. super(CBAM, self).__init__()
  36. self.channel_attention = ChannelAttention(c1, ratio)
  37. self.spatial_attention = SpatialAttention(kernel_size)
  38. def forward(self, x):
  39. out = self.channel_attention(x) * x
  40. # c*h*w
  41. # c*h*w * 1*h*w
  42. out = self.spatial_attention(out) * out
  43. return out

3. ECA 注意力模块

论文名称:《ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks》

论文地址:https://arxiv.org/abs/1910.03151

代码地址:https://github.com/BangguWu/ECANet

3.1 原理

先前的方法大多致力于开发更复杂的注意力模块,以实现更好的性能,这不可避免地增加了模型的复杂性。为了克服性能和复杂性之间的矛盾,作者提出了一种有效的通道关注(ECA)模块,该模块只增加了少量的参数,却能获得明显的性能增益。

ECA

3.2 代码

  1. class ECA(nn.Module):
  2. """Constructs a ECA module.
  3. Args:
  4. channel: Number of channels of the input feature map
  5. k_size: Adaptive selection of kernel size
  6. """
  7. def __init__(self, c1,c2, k_size=3):
  8. super(ECA, self).__init__()
  9. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  10. self.conv = nn.Conv1d(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias=False)
  11. self.sigmoid = nn.Sigmoid()
  12. def forward(self, x):
  13. # feature descriptor on the global spatial information
  14. y = self.avg_pool(x)
  15. y = self.conv(y.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
  16. # Multi-scale information fusion
  17. y = self.sigmoid(y)
  18. return x * y.expand_as(x)

4. CA 注意力模块

论文名称:《Coordinate Attention for Efficient Mobile Network Design》

论文地址:https://arxiv.org/abs/2103.02907

4.1 原理

先前的轻量级网络的注意力机制大都采用SE模块,仅考虑了通道间的信息,忽略了位置信息。尽管后来的BAMCBAM尝试在降低通道数后通过卷积来提取位置注意力信息,但卷积只能提取局部关系,缺乏长距离关系提取的能力。为此,论文提出了新的高效注意力机制coordinate attention(CA),能够将横向和纵向的位置信息编码到channel attention中,使得移动网络能够关注大范围的位置信息又不会带来过多的计算量。

coordinate attention的优势主要有以下几点:

  • 不仅获取了通道间信息,还考虑了方向相关的位置信息,有助于模型更好地定位和识别目标;
  • 足够灵活和轻量,能够简单地插入移动网络的核心结构中;
  • 可以作为预训练模型用于多种任务中,如检测和分割,均有不错的性能提升。

CA

4.2 代码

  1. # CA
  2. class h_sigmoid(nn.Module):
  3. def __init__(self, inplace=True):
  4. super(h_sigmoid, self).__init__()
  5. self.relu = nn.ReLU6(inplace=inplace)
  6. def forward(self, x):
  7. return self.relu(x + 3) / 6
  8. class h_swish(nn.Module):
  9. def __init__(self, inplace=True):
  10. super(h_swish, self).__init__()
  11. self.sigmoid = h_sigmoid(inplace=inplace)
  12. def forward(self, x):
  13. return x * self.sigmoid(x)
  14. class CoordAtt(nn.Module):
  15. def __init__(self, inp, oup, reduction=32):
  16. super(CoordAtt, self).__init__()
  17. self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
  18. self.pool_w = nn.AdaptiveAvgPool2d((1, None))
  19. mip = max(8, inp // reduction)
  20. self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)
  21. self.bn1 = nn.BatchNorm2d(mip)
  22. self.act = h_swish()
  23. self.conv_h = nn.Conv2d(mip, oup, kernel_size=1, stride=1, padding=0)
  24. self.conv_w = nn.Conv2d(mip, oup, kernel_size=1, stride=1, padding=0)
  25. def forward(self, x):
  26. identity = x
  27. n, c, h, w = x.size()
  28. #c*1*W
  29. x_h = self.pool_h(x)
  30. #c*H*1
  31. #C*1*h
  32. x_w = self.pool_w(x).permute(0, 1, 3, 2)
  33. y = torch.cat([x_h, x_w], dim=2)
  34. #C*1*(h+w)
  35. y = self.conv1(y)
  36. y = self.bn1(y)
  37. y = self.act(y)
  38. x_h, x_w = torch.split(y, [h, w], dim=2)
  39. x_w = x_w.permute(0, 1, 3, 2)
  40. a_h = self.conv_h(x_h).sigmoid()
  41. a_w = self.conv_w(x_w).sigmoid()
  42. out = identity * a_w * a_h
  43. return out

5. 添加方式

大致的修改方式如下

YOLOv5 或 YOLOv7 中添加注意力机制可分为如下 5 步,以在 yolov5s 中添加 SE 注意力机制为例子:

  1. yolov5/models文件夹下新建一个 yolov5s_SE.yaml
  2. 将本文上面提供的 SE 注意力代码添加到 common.py 文件末尾;
  3. SE 这个类的名字加入到 yolov5/models/yolo.py 中;
  4. 修改 yolov5s_SE.yaml ,将 SE 注意力加到你想添加的位置;
  5. 修改 train.py 文件的 '--cfg' 默认参数,随后就可以开始训练了。

详细的修改方式如下

  • 第 1 步:在yolov5/models文件夹下新建一个 yolov5_SE.yaml ,将 yolov5s.yaml 文件内容拷贝粘贴到我们新建的 yolov5s_SE.yaml 文件中等待第 4 步使用;
  • 第 2 步:将本文上面提供的 SE 注意力代码添加到 yolov5/models/common.py 文件末尾;
  1. class SE(nn.Module):
  2. def __init__(self, c1, c2, ratio=16):
  3. super(SE, self).__init__()
  4. #c*1*1
  5. self.avgpool = nn.AdaptiveAvgPool2d(1)
  6. self.l1 = nn.Linear(c1, c1 // ratio, bias=False)
  7. self.relu = nn.ReLU(inplace=True)
  8. self.l2 = nn.Linear(c1 // ratio, c1, bias=False)
  9. self.sig = nn.Sigmoid()
  10. def forward(self, x):
  11. b, c, _, _ = x.size()
  12. y = self.avgpool(x).view(b, c)
  13. y = self.l1(y)
  14. y = self.relu(y)
  15. y = self.l2(y)
  16. y = self.sig(y)
  17. y = y.view(b, c, 1, 1)
  18. return x * y.expand_as(x)
  • 第 3 步:将 SE 这个类的名字加入到 yolov5/models/yolo.py 如下位置;

你的可能和我有点区别,不用在意

  • 第 4 步:修改 yolov5s_SE.yaml ,将 SE 注意力加到你想添加的位置;常见的位置有 C3 模块后面,Neck 中,也可以在主干的 SPPF 前添加一层;我这里演示添加到 SPPF 上一层: 将 [-1, 1, SE,[1024]], 添加到 SPPF 的上一层,即下图中所示位置:

加到这里还没完,还有两个细节需要注意!

当在网络中添加了新的层之后,那么该层网络后续的层的编号都会发生改变,看下图,原本Detect 指定的是[17,20,23] 层,所以在我们添加了 SE 注意力层之后也要对 Detect 的参数进行修改,即原来的 17 层变成了 18 层;原来的 20 层变成了 21 层;原来的 23 层变成了 24 层;所以 Detecetfrom 系数要改为[18,21,24]

左侧是原始的 yolov5s.yaml ,右侧为修改后的 yolov5s_SE.yaml

左侧是原始的 yolov5s.yaml ,右侧为修改后的 yolov5s_SE.yaml

同样的,Concatfrom 系数也要修改,这样才能保持原网络结构不发生特别大的改变,我们刚才把 SE 层加到了第 $9$ 层,所以第 $9$ 层之后的编号都会加 $1$ ,这里我们要把后面两个 Concatfrom 系数分别由 $[-1,14],[-1,10]$ 改为 $[-1,15],[-1,11]$

左侧是原始的 yolov5s.yaml ,右侧为修改后的 yolov5s_SE.yaml

如果这一步的原理大家没看懂的话,可以看看我的哔哩哔哩视频,我讲解了yaml文件的原理: 点击跳转
  • 第 5 步:修改 train.py 文件的 '--cfg' 默认参数,在'--cfg' 后的 default= 后面加上 yolov5s_SE.yaml 的路径,随后就可以开始训练了。

在训练时会打印模型的结构,当出现下面的结构时,就代表我们添加成功了:

最后放上我加入 SE 注意力层后完整的配置文件 yolov5s_SE.yaml

  1. # Parameters
  2. nc: 80 # number of classes
  3. depth_multiple: 0.33 # model depth multiple
  4. width_multiple: 0.50 # layer channel multiple
  5. anchors:
  6. - [10,13, 16,30, 33,23] # P3/8
  7. - [30,61, 62,45, 59,119] # P4/16
  8. - [116,90, 156,198, 373,326] # P5/32
  9. # YOLOv5 v6.0 backbone+SE
  10. backbone:
  11. # [from, number, module, args]
  12. [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
  13. [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
  14. [-1, 3, C3, [128]],
  15. [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
  16. [-1, 6, C3, [256]],
  17. [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
  18. [-1, 9, C3, [512]],
  19. [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
  20. [-1, 3, C3, [1024]],
  21. [-1, 1, SE, [1024]], #SE
  22. [-1, 1, SPPF, [1024, 5]], # 10
  23. ]
  24. # YOLOv5+SE v6.0 head
  25. head:
  26. [[-1, 1, Conv, [512, 1, 1]],
  27. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  28. [[-1, 6], 1, Concat, [1]], # cat backbone P4
  29. [-1, 3, C3, [512, False]], # 14
  30. [-1, 1, Conv, [256, 1, 1]],
  31. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  32. [[-1, 4], 1, Concat, [1]], # cat backbone P3
  33. [-1, 3, C3, [256, False]], # 18 (P3/8-small)
  34. [-1, 1, Conv, [256, 3, 2]],
  35. [[-1, 15], 1, Concat, [1]], # cat head P4
  36. [-1, 3, C3, [512, False]], # 21 (P4/16-medium)
  37. [-1, 1, Conv, [512, 3, 2]],
  38. [[-1, 11], 1, Concat, [1]], # cat head P5
  39. [-1, 3, C3, [1024, False]], # 24 (P5/32-large)
  40. [[18, 21, 24], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  41. ]

6. SOCA 注意力模块

论文地址:https://openaccess.thecvf.com.pdf

SOCA

代码

  1. import numpy as np
  2. import torch
  3. from torch import nn
  4. from torch.nn import init
  5. from torch.autograd import Function
  6. class Covpool(Function):
  7. @staticmethod
  8. def forward(ctx, input):
  9. x = input
  10. batchSize = x.data.shape[0]
  11. dim = x.data.shape[1]
  12. h = x.data.shape[2]
  13. w = x.data.shape[3]
  14. M = h*w
  15. x = x.reshape(batchSize,dim,M)
  16. I_hat = (-1./M/M)*torch.ones(M,M,device = x.device) + (1./M)*torch.eye(M,M,device = x.device)
  17. I_hat = I_hat.view(1,M,M).repeat(batchSize,1,1).type(x.dtype)
  18. y = x.bmm(I_hat).bmm(x.transpose(1,2))
  19. ctx.save_for_backward(input,I_hat)
  20. return y
  21. @staticmethod
  22. def backward(ctx, grad_output):
  23. input,I_hat = ctx.saved_tensors
  24. x = input
  25. batchSize = x.data.shape[0]
  26. dim = x.data.shape[1]
  27. h = x.data.shape[2]
  28. w = x.data.shape[3]
  29. M = h*w
  30. x = x.reshape(batchSize,dim,M)
  31. grad_input = grad_output + grad_output.transpose(1,2)
  32. grad_input = grad_input.bmm(x).bmm(I_hat)
  33. grad_input = grad_input.reshape(batchSize,dim,h,w)
  34. return grad_input
  35. class Sqrtm(Function):
  36. @staticmethod
  37. def forward(ctx, input, iterN):
  38. x = input
  39. batchSize = x.data.shape[0]
  40. dim = x.data.shape[1]
  41. dtype = x.dtype
  42. I3 = 3.0*torch.eye(dim,dim,device = x.device).view(1, dim, dim).repeat(batchSize,1,1).type(dtype)
  43. normA = (1.0/3.0)*x.mul(I3).sum(dim=1).sum(dim=1)
  44. A = x.div(normA.view(batchSize,1,1).expand_as(x))
  45. Y = torch.zeros(batchSize, iterN, dim, dim, requires_grad = False, device = x.device)
  46. Z = torch.eye(dim,dim,device = x.device).view(1,dim,dim).repeat(batchSize,iterN,1,1)
  47. if iterN < 2:
  48. ZY = 0.5*(I3 - A)
  49. Y[:,0,:,:] = A.bmm(ZY)
  50. else:
  51. ZY = 0.5*(I3 - A)
  52. Y[:,0,:,:] = A.bmm(ZY)
  53. Z[:,0,:,:] = ZY
  54. for i in range(1, iterN-1):
  55. ZY = 0.5*(I3 - Z[:,i-1,:,:].bmm(Y[:,i-1,:,:]))
  56. Y[:,i,:,:] = Y[:,i-1,:,:].bmm(ZY)
  57. Z[:,i,:,:] = ZY.bmm(Z[:,i-1,:,:])
  58. ZY = 0.5*Y[:,iterN-2,:,:].bmm(I3 - Z[:,iterN-2,:,:].bmm(Y[:,iterN-2,:,:]))
  59. y = ZY*torch.sqrt(normA).view(batchSize, 1, 1).expand_as(x)
  60. ctx.save_for_backward(input, A, ZY, normA, Y, Z)
  61. ctx.iterN = iterN
  62. return y
  63. @staticmethod
  64. def backward(ctx, grad_output):
  65. input, A, ZY, normA, Y, Z = ctx.saved_tensors
  66. iterN = ctx.iterN
  67. x = input
  68. batchSize = x.data.shape[0]
  69. dim = x.data.shape[1]
  70. dtype = x.dtype
  71. der_postCom = grad_output*torch.sqrt(normA).view(batchSize, 1, 1).expand_as(x)
  72. der_postComAux = (grad_output*ZY).sum(dim=1).sum(dim=1).div(2*torch.sqrt(normA))
  73. I3 = 3.0*torch.eye(dim,dim,device = x.device).view(1, dim, dim).repeat(batchSize,1,1).type(dtype)
  74. if iterN < 2:
  75. der_NSiter = 0.5*(der_postCom.bmm(I3 - A) - A.bmm(der_sacleTrace))
  76. else:
  77. dldY = 0.5*(der_postCom.bmm(I3 - Y[:,iterN-2,:,:].bmm(Z[:,iterN-2,:,:])) -
  78. Z[:,iterN-2,:,:].bmm(Y[:,iterN-2,:,:]).bmm(der_postCom))
  79. dldZ = -0.5*Y[:,iterN-2,:,:].bmm(der_postCom).bmm(Y[:,iterN-2,:,:])
  80. for i in range(iterN-3, -1, -1):
  81. YZ = I3 - Y[:,i,:,:].bmm(Z[:,i,:,:])
  82. ZY = Z[:,i,:,:].bmm(Y[:,i,:,:])
  83. dldY_ = 0.5*(dldY.bmm(YZ) -
  84. Z[:,i,:,:].bmm(dldZ).bmm(Z[:,i,:,:]) -
  85. ZY.bmm(dldY))
  86. dldZ_ = 0.5*(YZ.bmm(dldZ) -
  87. Y[:,i,:,:].bmm(dldY).bmm(Y[:,i,:,:]) -
  88. dldZ.bmm(ZY))
  89. dldY = dldY_
  90. dldZ = dldZ_
  91. der_NSiter = 0.5*(dldY.bmm(I3 - A) - dldZ - A.bmm(dldY))
  92. grad_input = der_NSiter.div(normA.view(batchSize,1,1).expand_as(x))
  93. grad_aux = der_NSiter.mul(x).sum(dim=1).sum(dim=1)
  94. for i in range(batchSize):
  95. grad_input[i,:,:] += (der_postComAux[i] \
  96. - grad_aux[i] / (normA[i] * normA[i])) \
  97. *torch.ones(dim,device = x.device).diag()
  98. return grad_input, None
  99. def CovpoolLayer(var):
  100. return Covpool.apply(var)
  101. def SqrtmLayer(var, iterN):
  102. return Sqrtm.apply(var, iterN)
  103. class SOCA(nn.Module):
  104. # second-order Channel attention
  105. def __init__(self, channel, reduction=8):
  106. super(SOCA, self).__init__()
  107. self.max_pool = nn.MaxPool2d(kernel_size=2)
  108. self.conv_du = nn.Sequential(
  109. nn.Conv2d(channel, channel // reduction, 1, padding=0, bias=True),
  110. nn.ReLU(inplace=True),
  111. nn.Conv2d(channel // reduction, channel, 1, padding=0, bias=True),
  112. nn.Sigmoid()
  113. )
  114. def forward(self, x):
  115. batch_size, C, h, w = x.shape # x: NxCxHxW
  116. N = int(h * w)
  117. min_h = min(h, w)
  118. h1 = 1000
  119. w1 = 1000
  120. if h < h1 and w < w1:
  121. x_sub = x
  122. elif h < h1 and w > w1:
  123. W = (w - w1) // 2
  124. x_sub = x[:, :, :, W:(W + w1)]
  125. elif w < w1 and h > h1:
  126. H = (h - h1) // 2
  127. x_sub = x[:, :, H:H + h1, :]
  128. else:
  129. H = (h - h1) // 2
  130. W = (w - w1) // 2
  131. x_sub = x[:, :, H:(H + h1), W:(W + w1)]
  132. cov_mat = CovpoolLayer(x_sub) # Global Covariance pooling layer
  133. cov_mat_sqrt = SqrtmLayer(cov_mat,5) # Matrix square root layer( including pre-norm,Newton-Schulz iter. and post-com. with 5 iteration)
  134. cov_mat_sum = torch.mean(cov_mat_sqrt,1)
  135. cov_mat_sum = cov_mat_sum.view(batch_size,C,1,1)
  136. y_cov = self.conv_du(cov_mat_sum)
  137. return y_cov*x

yolov5/models/yolo.py 的如下位置添加下面的判断语句,虽然和上面介绍的添加方式不同,但是原理都是一样的。

  1. elif m is SOCA:
  2. c1, c2 = ch[f], args[0]
  3. if c2 != no:
  4. c2 = make_divisible(c2 * gw, 8)
  5. args = [c1, *args[1:]]

yolov5s_SOCA.yaml

  1. # Parameters
  2. nc: 20 # number of classes
  3. depth_multiple: 0.33 # model depth multiple
  4. width_multiple: 0.50 # layer channel multiple
  5. anchors:
  6. - [10,13, 16,30, 33,23] # P3/8
  7. - [30,61, 62,45, 59,119] # P4/16
  8. - [116,90, 156,198, 373,326] # P5/32
  9. # YOLOv5 v6.0 backbone+SE
  10. backbone:
  11. # [from, number, module, args]
  12. [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
  13. [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
  14. [-1, 3, C3, [128]],
  15. [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
  16. [-1, 6, C3, [256]],
  17. [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
  18. [-1, 9, C3, [512]],
  19. [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
  20. [-1, 3, C3, [1024]],
  21. [-1, 1, SOCA,[1024]],
  22. [-1, 1, SPPF, [1024, 5]], # 10
  23. ]
  24. # YOLOv5 v6.1 head
  25. head:
  26. [[-1, 1, Conv, [512, 1, 1]],
  27. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  28. [[-1, 6], 1, Concat, [1]], # cat backbone P4
  29. [-1, 3, C3, [512, False]], # 14
  30. [-1, 1, Conv, [256, 1, 1]],
  31. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  32. [[-1, 4], 1, Concat, [1]], # cat backbone P3
  33. [-1, 3, C3, [256, False]], # 18 (P3/8-small)
  34. [-1, 1, Conv, [256, 3, 2]],
  35. [[-1, 15], 1, Concat, [1]], # cat head P4
  36. [-1, 3, C3, [512, False]], # 21 (P4/16-medium)
  37. [-1, 1, Conv, [512, 3, 2]],
  38. [[-1, 11], 1, Concat, [1]], # cat head P5
  39. [-1, 3, C3, [1024, False]], # 24 (P5/32-large)
  40. [[18, 21, 24], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  41. ]

7. SimAM 注意力模块

论文地址:http://proceedings.mlr.press/v139/yang21o/yang21o.pdf

SimAM

代码

  1. import torch
  2. import torch.nn as nn
  3. class SimAM(torch.nn.Module):
  4. def __init__(self, channels = None,out_channels = None, e_lambda = 1e-4):
  5. super(SimAM, self).__init__()
  6. self.activaton = nn.Sigmoid()
  7. self.e_lambda = e_lambda
  8. def __repr__(self):
  9. s = self.__class__.__name__ + '('
  10. s += ('lambda=%f)' % self.e_lambda)
  11. return s
  12. @staticmethod
  13. def get_module_name():
  14. return "simam"
  15. def forward(self, x):
  16. b, c, h, w = x.size()
  17. n = w * h - 1
  18. x_minus_mu_square = (x - x.mean(dim=[2,3], keepdim=True)).pow(2)
  19. y = x_minus_mu_square / (4 * (x_minus_mu_square.sum(dim=[2,3], keepdim=True) / n + self.e_lambda)) + 0.5
  20. return x * self.activaton(y)
  21. elif m is SimAM:
  22. c1, c2 = ch[f], args[0]
  23. if c2 != no:
  24. c2 = make_divisible(c2 * gw, 8)
  25. args = [c1, c2]
  26. [-1, 1, SimAM, [1024]],

8. S2-MLPv2 注意力模块

论文地址:https://arxiv.org/abs/2108.01072

S2-MLPv2

代码

  1. import numpy as np
  2. import torch
  3. from torch import nn
  4. from torch.nn import init
  5. # https://arxiv.org/abs/2108.01072
  6. def spatial_shift1(x):
  7. b,w,h,c = x.size()
  8. x[:,1:,:,:c//4] = x[:,:w-1,:,:c//4]
  9. x[:,:w-1,:,c//4:c//2] = x[:,1:,:,c//4:c//2]
  10. x[:,:,1:,c//2:c*3//4] = x[:,:,:h-1,c//2:c*3//4]
  11. x[:,:,:h-1,3*c//4:] = x[:,:,1:,3*c//4:]
  12. return x
  13. def spatial_shift2(x):
  14. b,w,h,c = x.size()
  15. x[:,:,1:,:c//4] = x[:,:,:h-1,:c//4]
  16. x[:,:,:h-1,c//4:c//2] = x[:,:,1:,c//4:c//2]
  17. x[:,1:,:,c//2:c*3//4] = x[:,:w-1,:,c//2:c*3//4]
  18. x[:,:w-1,:,3*c//4:] = x[:,1:,:,3*c//4:]
  19. return x
  20. class SplitAttention(nn.Module):
  21. def __init__(self,channel=512,k=3):
  22. super().__init__()
  23. self.channel=channel
  24. self.k=k
  25. self.mlp1=nn.Linear(channel,channel,bias=False)
  26. self.gelu=nn.GELU()
  27. self.mlp2=nn.Linear(channel,channel*k,bias=False)
  28. self.softmax=nn.Softmax(1)
  29. def forward(self,x_all):
  30. b,k,h,w,c=x_all.shape
  31. x_all=x_all.reshape(b,k,-1,c)
  32. a=torch.sum(torch.sum(x_all,1),1)
  33. hat_a=self.mlp2(self.gelu(self.mlp1(a)))
  34. hat_a=hat_a.reshape(b,self.k,c)
  35. bar_a=self.softmax(hat_a)
  36. attention=bar_a.unsqueeze(-2)
  37. out=attention*x_all
  38. out=torch.sum(out,1).reshape(b,h,w,c)
  39. return out
  40. class S2Attention(nn.Module):
  41. def __init__(self, channels=512 ):
  42. super().__init__()
  43. self.mlp1 = nn.Linear(channels,channels*3)
  44. self.mlp2 = nn.Linear(channels,channels)
  45. self.split_attention = SplitAttention()
  46. def forward(self, x):
  47. b,c,w,h = x.size()
  48. x=x.permute(0,2,3,1)
  49. x = self.mlp1(x)
  50. x1 = spatial_shift1(x[:,:,:,:c])
  51. x2 = spatial_shift2(x[:,:,:,c:c*2])
  52. x3 = x[:,:,:,c*2:]
  53. x_all=torch.stack([x1,x2,x3],1)
  54. a = self.split_attention(x_all)
  55. x = self.mlp2(a)
  56. x=x.permute(0,3,1,2)
  57. return x
  58. elif m is S2Attention:
  59. c1, c2 = ch[f], args[0]
  60. if c2 != no:
  61. c2 = make_divisible(c2 * gw, 8)
  62. [-1, 1, S2Attention, [1024]],

9. NAMAttention 注意力模块

论文地址:https://arxiv.org/abs/2111.12419

NAMAttention

代码

  1. import torch.nn as nn
  2. import torch
  3. from torch.nn import functional as F
  4. class Channel_Att(nn.Module):
  5. def __init__(self, channels, t=16):
  6. super(Channel_Att, self).__init__()
  7. self.channels = channels
  8. self.bn2 = nn.BatchNorm2d(self.channels, affine=True)
  9. def forward(self, x):
  10. residual = x
  11. x = self.bn2(x)
  12. weight_bn = self.bn2.weight.data.abs() / torch.sum(self.bn2.weight.data.abs())
  13. x = x.permute(0, 2, 3, 1).contiguous()
  14. x = torch.mul(weight_bn, x)
  15. x = x.permute(0, 3, 1, 2).contiguous()
  16. x = torch.sigmoid(x) * residual #
  17. return x
  18. class NAMAttention(nn.Module):
  19. def __init__(self, channels, out_channels=None, no_spatial=True):
  20. super(NAMAttention, self).__init__()
  21. self.Channel_Att = Channel_Att(channels)
  22. def forward(self, x):
  23. x_out1=self.Channel_Att(x)
  24. return x_out1
  25. elif m is NAMAttention:
  26. c1, c2 = ch[f], args[0]
  27. if c2 != no:
  28. c2 = make_divisible(c2 * gw, 8)
  29. args = [c1, *args[1:]]
  30. [-1, 1, NAMAttention, [1024]],

10. Criss-CrossAttention 注意力模块

论文地址:https://arxiv.org/abs/1811.11721

NAMAttention

代码

  1. '''
  2. This code is borrowed from Serge-weihao/CCNet-Pure-Pytorch
  3. '''
  4. import torch
  5. import torch.nn as nn
  6. import torch.nn.functional as F
  7. from torch.nn import Softmax
  8. def INF(B,H,W):
  9. return -torch.diag(torch.tensor(float("inf")).repeat(H),0).unsqueeze(0).repeat(B*W,1,1)
  10. class CrissCrossAttention(nn.Module):
  11. """ Criss-Cross Attention Module"""
  12. def __init__(self, in_dim):
  13. super(CrissCrossAttention,self).__init__()
  14. self.query_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
  15. self.key_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
  16. self.value_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
  17. self.softmax = Softmax(dim=3)
  18. self.INF = INF
  19. self.gamma = nn.Parameter(torch.zeros(1))
  20. def forward(self, x):
  21. m_batchsize, _, height, width = x.size()
  22. proj_query = self.query_conv(x)
  23. proj_query_H = proj_query.permute(0,3,1,2).contiguous().view(m_batchsize*width,-1,height).permute(0, 2, 1)
  24. proj_query_W = proj_query.permute(0,2,1,3).contiguous().view(m_batchsize*height,-1,width).permute(0, 2, 1)
  25. proj_key = self.key_conv(x)
  26. proj_key_H = proj_key.permute(0,3,1,2).contiguous().view(m_batchsize*width,-1,height)
  27. proj_key_W = proj_key.permute(0,2,1,3).contiguous().view(m_batchsize*height,-1,width)
  28. proj_value = self.value_conv(x)
  29. proj_value_H = proj_value.permute(0,3,1,2).contiguous().view(m_batchsize*width,-1,height)
  30. proj_value_W = proj_value.permute(0,2,1,3).contiguous().view(m_batchsize*height,-1,width)
  31. energy_H = (torch.bmm(proj_query_H, proj_key_H)+self.INF(m_batchsize, height, width)).view(m_batchsize,width,height,height).permute(0,2,1,3)
  32. energy_W = torch.bmm(proj_query_W, proj_key_W).view(m_batchsize,height,width,width)
  33. concate = self.softmax(torch.cat([energy_H, energy_W], 3))
  34. att_H = concate[:,:,:,0:height].permute(0,2,1,3).contiguous().view(m_batchsize*width,height,height)
  35. #print(concate)
  36. #print(att_H)
  37. att_W = concate[:,:,:,height:height+width].contiguous().view(m_batchsize*height,width,width)
  38. out_H = torch.bmm(proj_value_H, att_H.permute(0, 2, 1)).view(m_batchsize,width,-1,height).permute(0,2,3,1)
  39. out_W = torch.bmm(proj_value_W, att_W.permute(0, 2, 1)).view(m_batchsize,height,-1,width).permute(0,2,1,3)
  40. #print(out_H.size(),out_W.size())
  41. return self.gamma*(out_H + out_W) + x
  42. elif m is CrissCrossAttention:
  43. c1, c2 = ch[f], args[0]
  44. if c2 != no:
  45. c2 = make_divisible(c2 * gw, 8)
  46. args = [c1, *args[1:]]
  47. [-1, 1, CrissCrossAttention, [1024]],

11. GAMAttention 注意力模块

论文地址:https://arxiv.org/pdf/2112.05561v1.pdf

GAMAttention

代码

  1. import numpy as np
  2. import torch
  3. from torch import nn
  4. from torch.nn import init
  5. class GAMAttention(nn.Module):
  6. #https://paperswithcode.com/paper/global-attention-mechanism-retain-information
  7. def __init__(self, c1, c2, group=True,rate=4):
  8. super(GAMAttention, self).__init__()
  9. self.channel_attention = nn.Sequential(
  10. nn.Linear(c1, int(c1 / rate)),
  11. nn.ReLU(inplace=True),
  12. nn.Linear(int(c1 / rate), c1)
  13. )
  14. self.spatial_attention = nn.Sequential(
  15. nn.Conv2d(c1, c1//rate, kernel_size=7, padding=3,groups=rate)if group else nn.Conv2d(c1, int(c1 / rate), kernel_size=7, padding=3),
  16. nn.BatchNorm2d(int(c1 /rate)),
  17. nn.ReLU(inplace=True),
  18. nn.Conv2d(c1//rate, c2, kernel_size=7, padding=3,groups=rate) if group else nn.Conv2d(int(c1 / rate), c2, kernel_size=7, padding=3),
  19. nn.BatchNorm2d(c2)
  20. )
  21. def forward(self, x):
  22. b, c, h, w = x.shape
  23. x_permute = x.permute(0, 2, 3, 1).view(b, -1, c)
  24. x_att_permute = self.channel_attention(x_permute).view(b, h, w, c)
  25. x_channel_att = x_att_permute.permute(0, 3, 1, 2)
  26. x = x * x_channel_att
  27. x_spatial_att = self.spatial_attention(x).sigmoid()
  28. x_spatial_att=channel_shuffle(x_spatial_att,4) #last shuffle
  29. out = x * x_spatial_att
  30. return out
  31. def channel_shuffle(x, groups=2): ##shuffle channel
  32. #RESHAPE----->transpose------->Flatten
  33. B, C, H, W = x.size()
  34. out = x.view(B, groups, C // groups, H, W).permute(0, 2, 1, 3, 4).contiguous()
  35. out=out.view(B, C, H, W)
  36. return out
  37. elif m is GAMAttention:
  38. c1, c2 = ch[f], args[0]
  39. if c2 != no:
  40. c2 = make_divisible(c2 * gw, 8)
  41. [-1, 1, GAMAttention, [1024,1024]],

12. Selective Kernel Attention 注意力模块

论文地址:https://arxiv.org/pdf/1903.06586.pdf

SK

代码

  1. class SKAttention(nn.Module):
  2. def __init__(self, channel=512,kernels=[1,3,5,7],reduction=16,group=1,L=32):
  3. super().__init__()
  4. self.d=max(L,channel//reduction)
  5. self.convs=nn.ModuleList([])
  6. for k in kernels:
  7. self.convs.append(
  8. nn.Sequential(OrderedDict([
  9. ('conv',nn.Conv2d(channel,channel,kernel_size=k,padding=k//2,groups=group)),
  10. ('bn',nn.BatchNorm2d(channel)),
  11. ('relu',nn.ReLU())
  12. ]))
  13. )
  14. self.fc=nn.Linear(channel,self.d)
  15. self.fcs=nn.ModuleList([])
  16. for i in range(len(kernels)):
  17. self.fcs.append(nn.Linear(self.d,channel))
  18. self.softmax=nn.Softmax(dim=0)
  19. def forward(self, x):
  20. bs, c, _, _ = x.size()
  21. conv_outs=[]
  22. ### split
  23. for conv in self.convs:
  24. conv_outs.append(conv(x))
  25. feats=torch.stack(conv_outs,0)#k,bs,channel,h,w
  26. ### fuse
  27. U=sum(conv_outs) #bs,c,h,w
  28. ### reduction channel
  29. S=U.mean(-1).mean(-1) #bs,c
  30. Z=self.fc(S) #bs,d
  31. ### calculate attention weight
  32. weights=[]
  33. for fc in self.fcs:
  34. weight=fc(Z)
  35. weights.append(weight.view(bs,c,1,1)) #bs,channel
  36. attention_weughts=torch.stack(weights,0)#k,bs,channel,1,1
  37. attention_weughts=self.softmax(attention_weughts)#k,bs,channel,1,1
  38. ### fuse
  39. V=(attention_weughts*feats).sum(0)
  40. return V
  41. elif m is SKAttention:
  42. c1, c2 = ch[f], args[0]
  43. if c2 != no:
  44. c2 = make_divisible(c2 * gw, 8)
  45. args = [c1, *args[1:]]
  46. [-1, 1, SKAttention, [1024]],

13. ShuffleAttention 注意力模块

论文地址:https://arxiv.org/pdf/2102.00240.pdf

SA

代码

  1. import numpy as np
  2. import torch
  3. from torch import nn
  4. from torch.nn import init
  5. from torch.nn.parameter import Parameter
  6. # https://arxiv.org/pdf/2102.00240.pdf
  7. class ShuffleAttention(nn.Module):
  8. def __init__(self, channel=512,reduction=16,G=8):
  9. super().__init__()
  10. self.G=G
  11. self.channel=channel
  12. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  13. self.gn = nn.GroupNorm(channel // (2 * G), channel // (2 * G))
  14. self.cweight = Parameter(torch.zeros(1, channel // (2 * G), 1, 1))
  15. self.cbias = Parameter(torch.ones(1, channel // (2 * G), 1, 1))
  16. self.sweight = Parameter(torch.zeros(1, channel // (2 * G), 1, 1))
  17. self.sbias = Parameter(torch.ones(1, channel // (2 * G), 1, 1))
  18. self.sigmoid=nn.Sigmoid()
  19. def init_weights(self):
  20. for m in self.modules():
  21. if isinstance(m, nn.Conv2d):
  22. init.kaiming_normal_(m.weight, mode='fan_out')
  23. if m.bias is not None:
  24. init.constant_(m.bias, 0)
  25. elif isinstance(m, nn.BatchNorm2d):
  26. init.constant_(m.weight, 1)
  27. init.constant_(m.bias, 0)
  28. elif isinstance(m, nn.Linear):
  29. init.normal_(m.weight, std=0.001)
  30. if m.bias is not None:
  31. init.constant_(m.bias, 0)
  32. @staticmethod
  33. def channel_shuffle(x, groups):
  34. b, c, h, w = x.shape
  35. x = x.reshape(b, groups, -1, h, w)
  36. x = x.permute(0, 2, 1, 3, 4)
  37. # flatten
  38. x = x.reshape(b, -1, h, w)
  39. return x
  40. def forward(self, x):
  41. b, c, h, w = x.size()
  42. #group into subfeatures
  43. x=x.view(b*self.G,-1,h,w) #bs*G,c//G,h,w
  44. #channel_split
  45. x_0,x_1=x.chunk(2,dim=1) #bs*G,c//(2*G),h,w
  46. #channel attention
  47. x_channel=self.avg_pool(x_0) #bs*G,c//(2*G),1,1
  48. x_channel=self.cweight*x_channel+self.cbias #bs*G,c//(2*G),1,1
  49. x_channel=x_0*self.sigmoid(x_channel)
  50. #spatial attention
  51. x_spatial=self.gn(x_1) #bs*G,c//(2*G),h,w
  52. x_spatial=self.sweight*x_spatial+self.sbias #bs*G,c//(2*G),h,w
  53. x_spatial=x_1*self.sigmoid(x_spatial) #bs*G,c//(2*G),h,w
  54. # concatenate along channel axis
  55. out=torch.cat([x_channel,x_spatial],dim=1) #bs*G,c//G,h,w
  56. out=out.contiguous().view(b,-1,h,w)
  57. # channel shuffle
  58. out = self.channel_shuffle(out, 2)
  59. return out
  60. elif m is ShuffleAttention:
  61. c1, c2 = ch[f], args[0]
  62. if c2 != no:
  63. c2 = make_divisible(c2 * gw, 8)
  64. args = [c1, c2, *args[1:]]
  65. [-1, 1, ShuffleAttention, [1024]],

14. A2-Net 注意力模块

论文地址:https://arxiv.org/pdf/1810.11579.pdf

A2

代码

  1. from torch.nn import init
  2. class DoubleAttention(nn.Module):
  3. """
  4. A2-Nets: Double Attention Networks. NIPS 2018
  5. """
  6. def __init__(self, in_channels, c_m, c_n, reconstruct=True):
  7. super().__init__()
  8. self.in_channels = in_channels
  9. self.reconstruct = reconstruct
  10. self.c_m = c_m
  11. self.c_n = c_n
  12. self.convA = nn.Conv2d(in_channels, c_m, 1)
  13. self.convB = nn.Conv2d(in_channels, c_n, 1)
  14. self.convV = nn.Conv2d(in_channels, c_n, 1)
  15. if self.reconstruct:
  16. self.conv_reconstruct = nn.Conv2d(c_m, in_channels, kernel_size=1)
  17. self.init_weights()
  18. def init_weights(self):
  19. for m in self.modules():
  20. if isinstance(m, nn.Conv2d):
  21. init.kaiming_normal_(m.weight, mode='fan_out')
  22. if m.bias is not None:
  23. init.constant_(m.bias, 0)
  24. elif isinstance(m, nn.BatchNorm2d):
  25. init.constant_(m.weight, 1)
  26. init.constant_(m.bias, 0)
  27. elif isinstance(m, nn.Linear):
  28. init.normal_(m.weight, std=0.001)
  29. if m.bias is not None:
  30. init.constant_(m.bias, 0)
  31. def forward(self, x):
  32. b, c, h, w = x.shape
  33. assert c == self.in_channels
  34. A = self.convA(x) # b,c_m,h,w
  35. B = self.convB(x) # b,c_n,h,w
  36. V = self.convV(x) # b,c_n,h,w
  37. tmpA = A.view(b, self.c_m, -1)
  38. attention_maps = F.softmax(B.view(b, self.c_n, -1), dim=1)
  39. attention_vectors = F.softmax(V.view(b, self.c_n, -1), dim=1)
  40. # step 1: feature gating
  41. global_descriptors = torch.bmm(tmpA, attention_maps.permute(0, 2, 1)) # b.c_m,c_n
  42. # step 2: feature distribution
  43. tmpZ = global_descriptors.matmul(attention_vectors) # b,c_m,h*w
  44. tmpZ = tmpZ.view(b, self.c_m, h, w) # b,c_m,h,w
  45. if self.reconstruct:
  46. tmpZ = self.conv_reconstruct(tmpZ)
  47. return tmpZ
  48. elif m is DoubleAttention:
  49. c1, c2 = ch[f], args[0]
  50. if c2 != no: # if not output
  51. c2 = make_divisible(c2 * gw, 8)
  52. args = [c1, c2, *args[1:]]
  53. [-1, 1, DoubleAttention,[1024,256,256]],

15. DANPositional 注意力模块

在这里插入图片描述


16. DANChannel 注意力模块

在这里插入图片描述


17. RESNest 注意力模块

在这里插入图片描述


18. Harmonious 注意力模块

在这里插入图片描述


19. SpatialAttention 注意力模块

在这里插入图片描述


19. RANet 注意力模块

在这里插入图片描述


20. Co-excite 注意力模块

在这里插入图片描述


21. EfficientAttention 注意力模块

在这里插入图片描述


22. X-Linear 注意力模块

在这里插入图片描述


23. SlotAttention 注意力模块

在这里插入图片描述


24. Axial 注意力模块

在这里插入图片描述


25. RFA 注意力模块

在这里插入图片描述


26. Attention-BasedDropout 注意力模块

在这里插入图片描述


27. ReverseAttention 注意力模块

在这里插入图片描述


28. CrossAttention 注意力模块

在这里插入图片描述


29. Perceiver 注意力模块

在这里插入图片描述


30. Criss-CrossAttention 注意力模块

在这里插入图片描述

在这里插入图片描述


31. BoostedAttention 注意力模块

在这里插入图片描述


32. Prophet 注意力模块

 


33. S3TA 注意力模块

 


34. Self-CriticAttention 注意力模块

 


35. BayesianAttentionBeliefNetworks 注意力模块

 


36. Expectation-MaximizationAttention 注意力模块

 


37. GaussianAttention 注意力模块

 


本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号