当前位置:   article > 正文

通俗易懂理解CA(Coordinate Attention)_ca注意力机制

ca注意力机制

一、参考资料

github代码:CoordAttention

Coordinate Attention

二、相关介绍

通道注意力与空间注意力

关于通道注意力和空间注意力的详细介绍,请参考另一篇博客:通俗易懂理解通道注意力机制(CAM)与空间注意力机制(SAM)

注意力机制是用来告诉模型需要关注哪些内容和哪些位置。对于网络的某一点输出,尺寸一般是(batchsize,C,H,W),channel attention 是利用通道信息( C ),spatial attention 是利用位置信息(H, W)。

三、CA(Coordinate Attention)相关介绍

1. 不同注意力方法对比

1.1 不同注意力的结构

  • SE模块[1]从feature map中channel着手,学习模型应该关注哪些通道,如图(a);
  • CBAM(Convolutional Block Attention Module)[2]作为SE的改进版本,结合了channel和spatial,如图(b);
  • 实际上,CBAM提取空间注意力是用局部卷积,只能捕获局部的信息,无法获得长距离依赖。且CBAM全局池化会有位置损失,实际上pool这种都会有信息损失,基于此作者提出了CA[3],充分利用了位置信息,而且控制了计算开销,如图©;

在这里插入图片描述

解释说明

  • GAP,表示全局平均池化(global average pooling);
  • GMP,表示全局最大池化(global max pooling);
  • X Avg Pool,表示一维水平全局池化(1D horizontal global average pooling);
  • Y Avg Pool,表示一维垂直全局平均池化(1D vertical global average pooling)。

1.2 不同注意力的性能

不同注意力方法在三种经典视觉任务中的表现,如下图所示:

在这里插入图片描述

解释说明

  • MBV2,表示MobileNetV2

2. (Paddle)代码实现

# CA (coordinate attention)

import paddle
import paddle.nn as nn
import math
import paddle.nn.functional as F


class CA(nn.Layer):
    def __init__(self, in_ch, reduction=32):
        super(CA, self).__init__()
        self.pool_h = nn.AdaptiveAvgPool2D((None, 1))
        self.pool_w = nn.AdaptiveAvgPool2D((1, None))

        mip = max(8, in_ch // reduction)

        self.conv1 = nn.Conv2D(in_ch, mip, kernel_size=1, stride=1, padding=0)
        self.bn1 = nn.BatchNorm2D(mip)
        self.act = nn.Hardswish()
        
        self.conv_h = nn.Conv2D(mip, in_ch, kernel_size=1, stride=1, padding=0)
        self.conv_w = nn.Conv2D(mip, in_ch, kernel_size=1, stride=1, padding=0)
        

    def forward(self, x):
        identity = x
        
        n,c,h,w = x.shape
        x_h = self.pool_h(x)
        x_w = self.pool_w(x).transpose([0, 1, 3, 2])

        y = paddle.concat([x_h, x_w], axis=2)
        y = self.conv1(y)
        y = self.bn1(y)
        y = self.act(y) 
        
        x_h, x_w = paddle.split(y, [h, w], axis=2)
        x_w = x_w.transpose([0, 1, 3, 2])

        x_h = F.sigmoid(self.conv_h(x_h))
        x_w = F.sigmoid(self.conv_w(x_w))      

        out = identity * x_w * x_h

        return out
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45

验证

# validation
# input size = 64,512,14,14 --> CA --> output size = 64,512,14,14

ca = CA(512)                     # in_channel
x = paddle.randn([64,512,14,14]) # (batchsize, channel, H, W)
y = ca(x)
y.shape
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

3. (PyTorch)代码实现

Coordinate Attention for Efficient Mobile Network Design

import torch
import torch.nn as nn
import math
import torch.nn.functional as F


class h_sigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(h_sigmoid, self).__init__()
        self.relu = nn.ReLU6(inplace=inplace)

    def forward(self, x):
        return self.relu(x + 3) / 6

class h_swish(nn.Module):
    def __init__(self, inplace=True):
        super(h_swish, self).__init__()
        self.sigmoid = h_sigmoid(inplace=inplace)

    def forward(self, x):
        return x * self.sigmoid(x)

class CoordAtt(nn.Module):
    def __init__(self, inp, oup, reduction=32):
        super(CoordAtt, self).__init__()
        self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
        self.pool_w = nn.AdaptiveAvgPool2d((1, None))

        mip = max(8, inp // reduction)

        self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)
        self.bn1 = nn.BatchNorm2d(mip)
        self.act = h_swish()
        
        self.conv_h = nn.Conv2d(mip, oup, kernel_size=1, stride=1, padding=0)
        self.conv_w = nn.Conv2d(mip, oup, kernel_size=1, stride=1, padding=0)
        

    def forward(self, x):
        identity = x
        
        n,c,h,w = x.size()
        x_h = self.pool_h(x)
        x_w = self.pool_w(x).permute(0, 1, 3, 2)

        y = torch.cat([x_h, x_w], dim=2)
        y = self.conv1(y)
        y = self.bn1(y)
        y = self.act(y)  # Non-linear
        
        x_h, x_w = torch.split(y, [h, w], dim=2)
        x_w = x_w.permute(0, 1, 3, 2)

        a_h = self.conv_h(x_h).sigmoid()
        a_w = self.conv_w(x_w).sigmoid()

        out = identity * a_w * a_h

        return out
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59

四、参考文献

[1] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.

[2] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.

[3] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/347932
推荐阅读
相关标签
  

闽ICP备14008679号