赞
踩
介绍了一种新的特征金字塔网络结构——渐近特征金字塔网络(Asymptotic Feature Pyramid Network, AFPN),旨在解决目标检测任务中多尺度特征提取的问题,特别是非相邻层级间特征信息的损失和退化。以下是AFPN的主要优点:
直接交互非相邻层级:AFPN通过渐进的方式使低层特征与高层特征直接融合,从一开始就融合两个不同分辨率的低层特征,并逐渐加入更多高层特征,直至融合骨干网络的顶层特征。这种设计避免了传统方法中因间接交互导致的非相邻层级间较大的语义差距。
保持细节与语义信息:在融合过程中,低层特征与高层的语义信息结合,同时高层特征也融入来自低层的详细信息。这确保了检测对象的特征既包含丰富的细节也具备充分的语义理解,有助于提高检测精度。
自适应空间融合操作:为了解决不同层级特征在同一位置可能存在的信息冲突问题,AFPN引入了自适应空间融合操作。这一操作能够筛选出多级融合中的有益信息,避免简单的元素相加可能导致的信息混乱,提高了特征融合的有效性。
性能提升与效率优化:实验结果显示,将AFPN应用于Faster R-CNN框架时,在MS COCO 2017数据集上,使用ResNet-50和ResNet-101作为主干网络分别获得了1.6%和2.6%的性能提升。此外,与最先进的特征金字塔网络相比,AFPN不仅具有更优的检测性能,还实现了最低的计算量(以FLOPs衡量)。在YOLOv5这样的单阶段检测器上,AFPN同样表现出色,提升了性能的同时减少了参数数量。
广泛适用性:研究证明AFPN在两阶段和一阶段检测框架上均能显著提升检测性能,显示了其方法的通用性和灵活性。
资源效率:虽然AFPN的参数数量比基础的FPN增加了21.0%,但它在所有比较方法中具有最低的GFLOPs,这主要归功于特征维度的降低,表明了其在资源利用上的高效性。
综上所述,AFPN通过创新的渐近融合策略和自适应空间融合技术,有效解决了多尺度特征融合中的关键问题,为对象检测任务提供了更为高效、准确的解决方案。
首先在YOLOv5/v7的models文件夹下新建文件afpnv.py,导入如下代码
- from models.common import *
-
- class Upsample(nn.Module):
- """Applies convolution followed by upsampling."""
-
- def __init__(self, c1, c2, scale_factor=2):
- super().__init__()
- if scale_factor == 2:
- self.cv1 = nn.ConvTranspose2d(c1, c2, 2, 2, 0, bias=True) # nn.Upsample(scale_factor=2, mode='nearest')
- elif scale_factor == 4:
- self.cv1 = nn.ConvTranspose2d(c1, c2, 4, 4, 0, bias=True) # nn.Upsample(scale_factor=4, mode='nearest')
-
- def forward(self, x):
- # return self.upsample(self.cv1(x))
- return self.cv1(x)
-
-
- class ASFF2(nn.Module):
- """ASFF2 module for YOLO AFPN head https://arxiv.org/abs/2306.15988"""
-
- def __init__(self, c1, c2, level=0):
- super().__init__()
- c1_l, c1_h = c1[0], c1[1]
- self.level = level
- self.dim = c1_l, c1_h
- self.inter_dim = self.dim[self.level]
- compress_c = 8
-
- if level == 0:
- self.stride_level_1 = Upsample(c1_h, self.inter_dim)
- if level == 1:
- self.stride_level_0 = Conv(c1_l, self.inter_dim, 2, 2, 0) # downsample 2x
-
- self.weight_level_0 = Conv(self.inter_dim, compress_c, 1, 1)
- self.weight_level_1 = Conv(self.inter_dim, compress_c, 1, 1)
-
- self.weights_levels = nn.Conv2d(compress_c * 2, 2, kernel_size=1, stride=1, padding=0)
- self.conv = Conv(self.inter_dim, self.inter_dim, 3, 1)
-
- def forward(self, x):
- x_level_0, x_level_1 = x[0], x[1]
-
- if self.level == 0:
- level_0_resized = x_level_0
- level_1_resized = self.stride_level_1(x_level_1)
- elif self.level == 1:
- level_0_resized = self.stride_level_0(x_level_0)
- level_1_resized = x_level_1
-
- level_0_weight_v = self.weight_level_0(level_0_resized)
- level_1_weight_v = self.weight_level_1(level_1_resized)
- levels_weight_v = torch.cat((level_0_weight_v, level_1_weight_v), 1)
- levels_weight = self.weights_levels(levels_weight_v)
- levels_weight = F.softmax(levels_weight, dim=1)
-
- fused_out_reduced = level_0_resized * levels_weight[:, 0:1] + level_1_resized * levels_weight[:, 1:2]
- return self.conv(fused_out_reduced)
-
-
- class ASFF3(nn.Module):
- """ASFF3 module for YOLO AFPN head https://arxiv.org/abs/2306.15988"""
-
- def __init__(self, c1, c2, level=0):
- super().__init__()
- c1_l, c1_m, c1_h = c1[0], c1[1], c1[2]
- self.level = level
- self.dim = c1_l, c1_m, c1_h
- self.inter_dim = self.dim[self.level]
- compress_c = 8
-
- if level == 0:
- self.stride_level_1 = Upsample(c1_m, self.inter_dim)
- self.stride_level_2 = Upsample(c1_h, self.inter_dim, scale_factor=4)
-
- if level == 1:
- self.stride_level_0 = Conv(c1_l, self.inter_dim, 2, 2, 0) # downsample 2x
- self.stride_level_2 = Upsample(c1_h, self.inter_dim)
-
- if level == 2:
- self.stride_level_0 = Conv(c1_l, self.inter_dim, 4, 4, 0) # downsample 4x
- self.stride_level_1 = Conv(c1_m, self.inter_dim, 2, 2, 0) # downsample 2x
-
- self.weight_level_0 = Conv(self.inter_dim, compress_c, 1, 1)
- self.weight_level_1 = Conv(self.inter_dim, compress_c, 1, 1)
- self.weight_level_2 = Conv(self.inter_dim, compress_c, 1, 1)
-
- self.weights_levels = nn.Conv2d(compress_c * 3, 3, kernel_size=1, stride=1, padding=0)
- self.conv = Conv(self.inter_dim, self.inter_dim, 3, 1)
-
- def forward(self, x):
- x_level_0, x_level_1, x_level_2 = x[0], x[1], x[2]
-
- if self.level == 0:
- level_0_resized = x_level_0
- level_1_resized = self.stride_level_1(x_level_1)
- level_2_resized = self.stride_level_2(x_level_2)
-
- elif self.level == 1:
- level_0_resized = self.stride_level_0(x_level_0)
- level_1_resized = x_level_1
- level_2_resized = self.stride_level_2(x_level_2)
-
- elif self.level == 2:
- level_0_resized = self.stride_level_0(x_level_0)
- level_1_resized = self.stride_level_1(x_level_1)
- level_2_resized = x_level_2
-
- level_0_weight_v = self.weight_level_0(level_0_resized)
- level_1_weight_v = self.weight_level_1(level_1_resized)
- level_2_weight_v = self.weight_level_2(level_2_resized)
-
- levels_weight_v = torch.cat((level_0_weight_v, level_1_weight_v, level_2_weight_v), 1)
- w = self.weights_levels(levels_weight_v)
- w = F.softmax(w, dim=1)
-
- fused_out_reduced = level_0_resized * w[:, :1] + level_1_resized * w[:, 1:2] + level_2_resized * w[:, 2:]
- return self.conv(fused_out_reduced)
其次在在YOLOv5/v7项目文件下的models/yolo.py中在文件首部添加代码
from models.afpn import *
并搜索def parse_model(d, ch)
定位到如下行添加以下代码
- elif m is ASFF2:
- c1, c2 = [ch[f[0]], ch[f[1]]], args[0]
- c2 = make_divisible(c2 * gw, 8)
- args = [c1, c2, *args[1:]]
- elif m is ASFF3:
- c1, c2 = [ch[f[0]], ch[f[1]], ch[f[2]]], args[0]
- c2 = make_divisible(c2 * gw, 8)
- args = [c1, c2, *args[1:]]
完成二后,在YOLOv7项目文件下的models文件夹下创建新的文件yolov7-tiny-afpn.yaml,导入如下代码。
- # parameters
- nc: 80 # number of classes
- depth_multiple: 1.0 # model depth multiple
- width_multiple: 1.0 # layer channel multiple
-
- # anchors
- anchors:
- - [10,13, 16,30, 33,23] # P3/8
- - [30,61, 62,45, 59,119] # P4/16
- - [116,90, 156,198, 373,326] # P5/32
-
- # yolov7-tiny backbone
- backbone:
- # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True
- [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 0-P1/2
-
- [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 1-P2/4
-
- [-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [[-1, -2, -3, -4], 1, Concat, [1]],
- [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7
-
- [-1, 1, MP, []], # 8-P3/8
- [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [[-1, -2, -3, -4], 1, Concat, [1]],
- [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 14
-
- [-1, 1, MP, []], # 15-P4/16
- [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [[-1, -2, -3, -4], 1, Concat, [1]],
- [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 21
-
- [-1, 1, MP, []], # 22-P5/32
- [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [[-1, -2, -3, -4], 1, Concat, [1]],
- [-1, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 28
- ]
-
- # yolov7-tiny head
- head:
- [[14, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 29 downsample backbone P3
- [21, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 30 downsample backbone P4
-
- [[29, 30], 1, ASFF2, [64, 0]], # 31
- [[29, 30], 1, ASFF2, [128, 1]], # 32
-
- [-2, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 33
- [-2, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 34
-
- [28, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 35 downsample backbone P5
-
- [[33, 34, 35], 1, ASFF3, [64, 0]], # 36
- [[33, 34, 35], 1, ASFF3, [128, 1]], # 37
- [[33, 34, 35], 1, ASFF3, [256, 2]], # 38
-
- [36, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [37, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
- [38, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
-
- [[39, 40, 41], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
- ]
-
-
-
- from n params module arguments
- 0 -1 1 928 models.common.Conv [3, 32, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
- 1 -1 1 18560 models.common.Conv [32, 64, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
- 2 -1 1 2112 models.common.Conv [64, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 3 -2 1 2112 models.common.Conv [64, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 4 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 5 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 6 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
- 7 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 8 -1 1 0 models.common.MP []
- 9 -1 1 4224 models.common.Conv [64, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 10 -2 1 4224 models.common.Conv [64, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 11 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 12 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 13 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
- 14 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 15 -1 1 0 models.common.MP []
- 16 -1 1 16640 models.common.Conv [128, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 17 -2 1 16640 models.common.Conv [128, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 18 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 19 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 20 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
- 21 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 22 -1 1 0 models.common.MP []
- 23 -1 1 66048 models.common.Conv [256, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 24 -2 1 66048 models.common.Conv [256, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 25 -1 1 590336 models.common.Conv [256, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 26 -1 1 590336 models.common.Conv [256, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 27 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
- 28 -1 1 525312 models.common.Conv [1024, 512, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 29 14 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 30 21 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 31 [29, 30] 1 70914 models.afpn.ASFF2 [[64, 128], 64, 0]
- 32 [29, 30] 1 182850 models.afpn.ASFF2 [[64, 128], 128, 1]
- 33 -2 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 34 -2 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 35 28 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 36 [33, 34, 35] 1 333691 models.afpn.ASFF3 [[64, 128, 256], 64, 0]
- 37 [33, 34, 35] 1 315131 models.afpn.ASFF3 [[64, 128, 256], 128, 1]
- 38 [33, 34, 35] 1 990843 models.afpn.ASFF3 [[64, 128, 256], 256, 2]
- 39 36 1 73984 models.common.Conv [64, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 40 37 1 295424 models.common.Conv [128, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 41 38 1 1180672 models.common.Conv [256, 512, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
- 42 [39, 40, 41] 1 17132 models.yolo.IDetect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
-
- Model Summary: 251 layers, 6282689 parameters, 6282689 gradients, 18.9 GFLOPS
运行后若打印出如上文本代表改进成功。
完成二后,在YOLOv5项目文件下的models文件夹下创建新的文件yolov5s-afpn.yaml,导入如下代码。
- 推荐阅读
相关标签
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。