赞
踩
将YOLOv7流程图与yolov7.yaml文件进行一一对应,相互匹配,便于理解整个网络过程。
具体注释如下
- # parameters
- nc: 80 # number of classes
- depth_multiple: 1.0 # model depth multiple
- width_multiple: 1.0 # layer channel multiple
-
- # anchors
- anchors:
- - [12,16, 19,36, 40,28] # P3/8
- - [36,75, 76,55, 72,146] # P4/16
- - [142,110, 192,243, 459,401] # P5/32
-
- # yolov7 backbone
- backbone:
- # [from, number, module, args] 输入为 640*640*3
- [[-1, 1, Conv, [32, 3, 1]], # 0 第零层
-
- [-1, 1, Conv, [64, 3, 2]], # 1-P1/2 320*320*64
- [-1, 1, Conv, [64, 3, 1]],
-
- [-1, 1, Conv, [128, 3, 2]], # 3-P2/4 160*160*128
-
- # 该部分整体为ELAN,H和W不发生变化,倒数第二行channel数由C变为2C,最终输出channel由倒数第一行的Conv决定,该部分是2C
- [-1, 1, Conv, [64, 1, 1]],
- [-2, 1, Conv, [64, 1, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [[-1, -3, -5, -6], 1, Concat, [1]], # 160*160*256 Concat是在channel维度上进行拼接
- [-1, 1, Conv, [256, 1, 1]], # 11 160*160*256
-
- # 该部分为MP1,H和W变为原来的0.5倍,C不变
- [-1, 1, MP, []], # 80*80*256 MP为最大池化MaxPooling,H和W变为原来的0.5倍,channel不变
- [-1, 1, Conv, [128, 1, 1]],
- [-3, 1, Conv, [128, 1, 1]],
- [-1, 1, Conv, [128, 3, 2]],
- [[-1, -3], 1, Concat, [1]], # 16-P3/8 80*80*256
-
- # ELAN,H和W不发生变化,倒数第二行channel数由C变为2C,最终输出channel由倒数第一行的Conv决定,该部分是2C
- [-1, 1, Conv, [128, 1, 1]],
- [-2, 1, Conv, [128, 1, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [[-1, -3, -5, -6], 1, Concat, [1]], # 80*80*512
- [-1, 1, Conv, [512, 1, 1]], # 24 80*80*512
-
- # MP1 + ELAN
- [-1, 1, MP, []],
- [-1, 1, Conv, [256, 1, 1]],
- [-3, 1, Conv, [256, 1, 1]],
- [-1, 1, Conv, [256, 3, 2]],
- [[-1, -3], 1, Concat, [1]], # 29-P4/16 40*40*512
- [-1, 1, Conv, [256, 1, 1]],
- [-2, 1, Conv, [256, 1, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [[-1, -3, -5, -6], 1, Concat, [1]], # 40*40*1024
- [-1, 1, Conv, [1024, 1, 1]], # 37 40*40*1024
-
- # MP1 + ELAN 注意该部分ELAN与前面的不同,输出的channel与之前相同
- [-1, 1, MP, []],
- [-1, 1, Conv, [512, 1, 1]],
- [-3, 1, Conv, [512, 1, 1]],
- [-1, 1, Conv, [512, 3, 2]],
- [[-1, -3], 1, Concat, [1]], # 42-P5/32 20*20*1024
- [-1, 1, Conv, [256, 1, 1]],
- [-2, 1, Conv, [256, 1, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [[-1, -3, -5, -6], 1, Concat, [1]], # 20*20*1024
- [-1, 1, Conv, [1024, 1, 1]], # 50 20*20*1024
- ]
-
- # yolov7 head
- head:
- [[-1, 1, SPPCSPC, [512]], # 51 20*20*512 最终输出只有channel发生变化
-
- # UP+CONCAT 前两行构成UP模块,第三行对第二组MP1+ELAN后的输出(# 37)进行卷积,最后统一concat
- [-1, 1, Conv, [256, 1, 1]],
- [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 40*40*256 上采样,H和W变为原来的2倍,channel不变
- [37, 1, Conv, [256, 1, 1]], # route backbone P4
- [[-1, -2], 1, Concat, [1]], # 40*40*512
-
- # ELAN-H
- [-1, 1, Conv, [256, 1, 1]],
- [-2, 1, Conv, [256, 1, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], # 40*40*1024
- [-1, 1, Conv, [256, 1, 1]], # 63 40*40*256
-
- # UP+CONCAT
- [-1, 1, Conv, [128, 1, 1]],
- [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 80*80*128
- [24, 1, Conv, [128, 1, 1]], # route backbone P3
- [[-1, -2], 1, Concat, [1]], # 80*80*256
-
- # ELAN-H
- [-1, 1, Conv, [128, 1, 1]],
- [-2, 1, Conv, [128, 1, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [-1, 1, Conv, [64, 3, 1]],
- [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], # 80*80*512
- [-1, 1, Conv, [128, 1, 1]], # 75 80*80*128
-
- # MP2
- [-1, 1, MP, []], # 40*40*128
- [-1, 1, Conv, [128, 1, 1]],
- [-3, 1, Conv, [128, 1, 1]],
- [-1, 1, Conv, [128, 3, 2]], # 40*40*128
- [[-1, -3, 63], 1, Concat, [1]], # 40*40*512
-
- # ELAN-H
- [-1, 1, Conv, [256, 1, 1]],
- [-2, 1, Conv, [256, 1, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [-1, 1, Conv, [128, 3, 1]],
- [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], # 40*40*1024
- [-1, 1, Conv, [256, 1, 1]], # 88 40*40*256
-
- # MP2
- [-1, 1, MP, []], # 20*20*256
- [-1, 1, Conv, [256, 1, 1]],
- [-3, 1, Conv, [256, 1, 1]],
- [-1, 1, Conv, [256, 3, 2]], # 20*20*256
- [[-1, -3, 51], 1, Concat, [1]], # 20*20*1024
-
- # ELAN-H
- [-1, 1, Conv, [512, 1, 1]],
- [-2, 1, Conv, [512, 1, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [-1, 1, Conv, [256, 3, 1]],
- [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]], # 20*20*2048
- [-1, 1, Conv, [512, 1, 1]], # 101 20*20*512
-
- # RepConv是重参数化卷积,用3*3卷积重参数化,加速推理,不改变H和W,具体内容参考论文
- [75, 1, RepConv, [256, 3, 1]],
- [88, 1, RepConv, [512, 3, 1]],
- [101, 1, RepConv, [1024, 3, 1]],
-
- [[102,103,104], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
- ]
参考链接:深入浅出 Yolo 系列之 Yolov7 基础网络结构详解_yolov7网络结构_计算机视觉linke的博客-CSDN博客
上图为参照的V7结构图,其中,右下方的MP2的输出是80*80*256,个人计算结果是40*40*256。
MP1、MP2、UP模块看结构图不易理解,容易误解,建议自己梳理yaml文件,思路很清晰。
图里的ELAN模块比论文的好理解很多,
SPPCSPC在yaml文件里没有解释,具体代码在common.py里,附上代码如下,可结合流程图梳理,很清晰
- class SPPCSPC(nn.Module):
- # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
- def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
- super(SPPCSPC, self).__init__()
- c_ = int(2 * c2 * e) # hidden channels
- self.cv1 = Conv(c1, c_, 1, 1)
- self.cv2 = Conv(c1, c_, 1, 1)
- self.cv3 = Conv(c_, c_, 3, 1)
- self.cv4 = Conv(c_, c_, 1, 1)
- self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
- self.cv5 = Conv(4 * c_, c_, 1, 1)
- self.cv6 = Conv(c_, c_, 3, 1)
- self.cv7 = Conv(2 * c_, c2, 1, 1)
-
- def forward(self, x):
- x1 = self.cv4(self.cv3(self.cv1(x)))
- y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
- y2 = self.cv2(x)
- return self.cv7(torch.cat((y1, y2), dim=1))
最右面的是检测头部分,自下到上为P3 P4 P5
RepConv为重参数化卷积,简单来说就是在训练时的多分支结构等效为推理时的单路径结构,精度提升一点点,速度提升很多,个人看完RepVGG论文后感觉很有意义,数学推导很有意思也很有道理。附上论文中的示意图以及参考博主的流程图
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。