当前位置:   article > 正文

目标检测算法——YOLOv5/YOLOv7改进之结合ConvNeXt结构(纯卷积|超越Swin)_yolov5 convnext

yolov5 convnext

>>>深度学习Tricks,第一时间送达<<<

论文题目:A ConvNet for the 2020s

论文地址:https://arxiv.org/abs/2201.03545

源代码:https://github.com/facebookresearch/ConvNeXt

纯卷积主干网络!可与大火的分层视觉Transformer竞争!多个任务性能超越Swin

MetaAI在论文A ConvNet for the 2020s中, 从ResNet出发并借鉴Swin Transformer提出了一种新的 CNN 模型:ConvNeXt,其效果无论在图像分类还是检测分割任务上均能超过Swin Transformer,而且ConvNeXt和vision transformer一样具有类似的scalability(随着数据量和模型大小增加,性能同比提升)。

ConvNeXt 从原始的 ResNet 出发,逐步加入swin transform 的 trick,来改进模型。论文中适用 ResNet模型:ResNet50和ResNet200。其中ResNet50和Swin-T有类似的FLOPs(4G vs 4.5G),而ResNet200和Swin-B有类似的FLOPs(15G)。首先做的改进是调整训练策略,然后是模型设计方面的递进优化:宏观设计->ResNeXt化->改用Inverted bottleneck->采用large kernel size->微观设计。由于模型性能和FLOPs强相关,所以在优化过程中尽量保持FLOPs的稳定。

相关代码:

  1. class ConvNeXt(nn.Module):
  2. r""" ConvNeXt
  3. A PyTorch impl of : `A ConvNet for the 2020s` -
  4. https://arxiv.org/pdf/2201.03545.pdf
  5. Args:
  6. in_chans (int): Number of input image channels. Default: 3
  7. num_classes (int): Number of classes for classification head. Default: 1000
  8. depths (tuple(int)): Number of blocks at each stage. Default: [3, 3, 9, 3]
  9. dims (int): Feature dimension at each stage. Default: [96, 192, 384, 768]
  10. drop_path_rate (float): Stochastic depth rate. Default: 0.
  11. layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6.
  12. head_init_scale (float): Init scaling value for classifier weights and biases. Default: 1.
  13. """
  14. def __init__(self, in_chans=3, num_classes=1000,
  15. depths=[3, 3, 9, 3], dims: list = [96, 192, 384, 768], drop_path_rate=0.,
  16. layer_scale_init_value=1e-6, head_init_scale=1.,
  17. ):
  18. super().__init__()
  19. self.downsample_layers = nn.ModuleList() # stem and 3 intermediate downsampling conv layers
  20. stem = nn.Sequential(
  21. nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),
  22. LayerNorm(dims[0], eps=1e-6, data_format="channels_first")
  23. )
  24. self.downsample_layers.append(stem)
  25. for i in range(3):
  26. downsample_layer = nn.Sequential(
  27. LayerNorm(dims[i], eps=1e-6, data_format="channels_first"),
  28. # 下采样
  29. nn.Conv2d(dims[i], dims[i+1], kernel_size=2, stride=2),
  30. )
  31. self.downsample_layers.append(downsample_layer)
  32. # 4 feature resolution stages, each consisting of multiple residual blocks
  33. self.stages = nn.ModuleList()
  34. dp_rates = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]
  35. cur = 0
  36. for i in range(4):
  37. stage = nn.Sequential(
  38. *[Block(dim=dims[i], drop_path=dp_rates[cur + j],
  39. layer_scale_init_value=layer_scale_init_value)
  40. for j in range(depths[i])]
  41. )
  42. self.stages.append(stage)
  43. cur += depths[i]
  44. self.norm = nn.LayerNorm(dims[-1], eps=1e-6) # final norm layer
  45. self.head = nn.Linear(dims[-1], num_classes)
  46. self.apply(self._init_weights)
  47. self.head.weight.data.mul_(head_init_scale)
  48. self.head.bias.data.mul_(head_init_scale)
  49. def _init_weights(self, m):
  50. if isinstance(m, (nn.Conv2d, nn.Linear)):
  51. trunc_normal_(m.weight, std=.02)
  52. nn.init.constant_(m.bias, 0)
  53. def forward_features(self, x):
  54. for i in range(4):
  55. x = self.downsample_layers[i](x)
  56. x = self.stages[i](x)
  57. return self.norm(x.mean([-2, -1])) # global average pooling, (N, C, H, W) -> (N, C)
  58. def forward(self, x):
  59. x = self.forward_features(x)
  60. x = self.head(x)
  61. return x

通过借鉴Swin Transformer的设计来逐步地改进模型。论文共选择了两个不同大小的ResNet模型:ResNet50和ResNet200,其中ResNet50和Swin-T有类似的FLOPs(4G vs 4.5G),而ResNet200和Swin-B有类似的FLOPs(15G)。首先做的改进是调整训练策略,然后是模型设计方面的递进优化:宏观设计>ResNeXt化>改用Inverted bottleneck>采用large kernel size>微观设计。由于模型性能和FLOPs强相关,所以在优化过程中尽量保持FLOPs的稳定。 ​ConVNeXt 这篇文章,通过借鉴 Swin TransForm 精心构建的 tricks,卷积在图像领域反超 Transformerer。

如何结合YOLOv5,有需要且感兴趣的小伙伴关注互粉一下,一起学习!共同进步!


声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】

推荐阅读
相关标签