当前位置:   article > 正文

yolov3权重_目标检测之 YOLOv3 (Pytorch实现)

yolo comfy

1.文章目的

Github上已经有YOLOv3 Pytorch版本的实现,但我总觉得不是自己的东西直接拿过来用着不舒服。想着自己动手丰衣足食,因此,本文主要是出于学习的目的,将YOLO3网络自己搭建一遍,然后使用官方提供的预训练权重进行预测,这样有助于对YOLOv3模型的理解。

2.目标检测的任务

目标检测计算机视觉中的一项任务,它包括识别给定照片中一个或多个目标的存在、位置和类型。这是一个具有挑战性的问题,涉及建立在对象识别方法(例如,它们在哪里)、对象定位(例如,它们的范围是什么)和对象分类(例如,它们是什么)的基础上。例如下面这张照片,目标检测的任务是识别出照片里有什么,它们在哪里,并用方框将它们标注出来。

fc7b2ff8f572f463a7b3f37ac0e6b722.png
三只斑马(Taken by Boegh)

3.YOLOv3模型

关于YOLOv3模型(原论文作者将其称之为“DarkNet”,这个名字听起来怪怪的)的介绍,网上有一大堆,这里不再哆嗦。网络结构如下图:

39e4cb549e7e09218a37bfd0a36a5660.png

另外有一点:对于搭建好的模型,我们将使用预先训练好的权重文件来进行预测,因此,有必要先下载好预训练权重文件(在国内如果你有足够的时间等待下载或者网络不会抽风那你可以不用迅雷。ps,迅雷是个好东西):

DarkNet在MSCOCO数据集上的预训练权重

4.模型搭建

导入需要用到的库:

  1. import numpy as np
  2. import torch
  3. import torch.nn as nn
  4. import torchvision
  5. from PIL import Image
  6. import matplotlib.pyplot as plt

定义 YOLOv3(DarkNet)网络中的层:

每个 DarkNet 层包括卷积层、batch norm(BN)层、激活函数。如果 DarkNet 层中有 BN 层,则其中得到卷积层只有权重而没有 bias。

DarkNet 网络分别在第 82,94,106 层会输出预测,即共计三个在不同 stride 下的输出,在这三个输出层中没有 BN 层,也没有激活函数。

  1. #Darknet层
  2. class DarknetLayer(nn.Module):
  3. def __init__(self, in_channels, out_channels, kernel_size, stride, padding, bnorm = True, leaky = True):
  4. super().__init__()
  5. self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias = False if bnorm else True)
  6. self.bnorm = nn.BatchNorm2d(out_channels, eps = 1e-3) if bnorm else None
  7. self.leaky = nn.LeakyReLU(0.1) if leaky else None
  8. def forward(self, x):
  9. x = self.conv(x)
  10. if self.bnorm is not None:
  11. x = self.bnorm(x)
  12. if self.leaky is not None:
  13. x = self.leaky(x)
  14. return x

定义 YOLOv3 网络中的块:

这里借鉴了 ResNet 中残差块的思想,一个块中会有一个跳跃,即输入在经过块中每一层之后得到一个临时输出,再将输入和临时输出在相同的位置处相加得到块的输出。

  1. #DarkNet块
  2. class DarknetBlock(nn.Module):
  3. def __init__(self, layers, skip = True):
  4. super().__init__()
  5. self.skip = skip
  6. self.layers = nn.ModuleDict()
  7. for i in range(len(layers)):
  8. self.layers[layers[i]['id']] = DarknetLayer(layers[i]['in_channels'], layers[i]['out_channels'], layers[i]['kernel_size'],
  9. layers[i]['stride'], layers[i]['padding'], layers[i]['bnorm'],
  10. layers[i]['leaky'])
  11. def forward(self, x):
  12. count = 0
  13. for _, layer in self.layers.items():
  14. if count == (len(self.layers) - 2) and self.skip:
  15. skip_connection = x
  16. count += 1
  17. x = layer(x)
  18. return x + skip_connection if self.skip else x

上述代码将几个 DarkNet 层堆叠成一个块。layers 是包含了几个字典的一个列表,每个字典声明了 DarkNet 层的的输入通道数,卷积核数等参数。skip 用于指明这个块是否作为残差块使用。

forword 函数中有一个 if 语句,这个语句的作用是,如果这个块是残差块,则将块中 stride 为 2 的层的输出和块的临时输出相加,如果没有 stride 为 2 的层,才将块的输入和临时的输出相加得到块的输出。

将块堆叠成 YOLOv3 网络:

总共有 106 层,第 82,94,106 层是输出,结构稍微有点复杂。

  1. #DarkNet网络
  2. class Yolov3(nn.Module):
  3. def __init__(self):
  4. super().__init__()
  5. self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners = False)
  6. #layer0 -> layer4, input = (3, 416, 416), flow_out = (64, 208, 208)
  7. self.blocks = nn.ModuleDict()
  8. self.blocks['block0_4'] = DarknetBlock([
  9. {'id': 'layer_0', 'in_channels': 3, 'out_channels': 32, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  10. {'id': 'layer_1', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},
  11. {'id': 'layer_2', 'in_channels': 64, 'out_channels': 32, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  12. {'id': 'layer_3', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  13. ])
  14. #layer5 -> layer8, input = (64, 208, 208), flow_out = (128, 104, 104)
  15. self.blocks['block5_8'] = DarknetBlock([
  16. {'id': 'layer_5', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},
  17. {'id': 'layer_6', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  18. {'id': 'layer_7', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  19. ])
  20. #layer9 -> layer11, input = (128, 104, 104), flow_out = (128, 104, 104)
  21. self.blocks['block9_11'] = DarknetBlock([
  22. {'id': 'layer_9', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  23. {'id': 'layer_10', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  24. ])
  25. #layer12 -> layer15, input = (128, 104, 104), flow_out = (256, 52, 52)
  26. self.blocks['block12_15'] = DarknetBlock([
  27. {'id': 'layer_12', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},
  28. {'id': 'layer_13', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  29. {'id': 'layer_14', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  30. ])
  31. #layer16 -> layer36, input = (256, 52, 52), flow_out = (256, 52, 52)
  32. self.blocks['block16_18'] = DarknetBlock([
  33. {'id': 'layer_16', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  34. {'id': 'layer_17', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  35. ])
  36. self.blocks['block19_21'] = DarknetBlock([
  37. {'id': 'layer_19', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  38. {'id': 'layer_20', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  39. ])
  40. self.blocks['block22_24'] = DarknetBlock([
  41. {'id': 'layer_22', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  42. {'id': 'layer_23', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  43. ])
  44. self.blocks['block25_27'] = DarknetBlock([
  45. {'id': 'layer_25', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  46. {'id': 'layer_26', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  47. ])
  48. self.blocks['block28_30'] = DarknetBlock([
  49. {'id': 'layer_28', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  50. {'id': 'layer_29', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  51. ])
  52. self.blocks['block31_33'] = DarknetBlock([
  53. {'id': 'layer_31', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  54. {'id': 'layer_32', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  55. ])
  56. self.blocks['block34_36'] = DarknetBlock([
  57. {'id': 'layer_34', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  58. {'id': 'layer_35', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  59. ])
  60. #layer37 -> layer40, input = (256, 52, 52), flow_out = (512, 26, 26)
  61. self.blocks['block37_40'] = DarknetBlock([
  62. {'id': 'layer_37', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},
  63. {'id': 'layer_38', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  64. {'id': 'layer_39', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  65. ])
  66. #layer41 -> layer61, input = (512, 26, 26), flow_out = (512, 26, 26)
  67. self.blocks['block41_43'] = DarknetBlock([
  68. {'id': 'layer_41', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  69. {'id': 'layer_42', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  70. ])
  71. self.blocks['block44_46'] = DarknetBlock([
  72. {'id': 'layer_44', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  73. {'id': 'layer_45', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  74. ])
  75. self.blocks['block47_49'] = DarknetBlock([
  76. {'id': 'layer_47', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  77. {'id': 'layer_48', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  78. ])
  79. self.blocks['block50_52'] = DarknetBlock([
  80. {'id': 'layer_50', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  81. {'id': 'layer_51', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  82. ])
  83. self.blocks['block53_55'] = DarknetBlock([
  84. {'id': 'layer_53', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  85. {'id': 'layer_54', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  86. ])
  87. self.blocks['block56_58'] = DarknetBlock([
  88. {'id': 'layer_56', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  89. {'id': 'layer_57', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  90. ])
  91. self.blocks['block59_61'] = DarknetBlock([
  92. {'id': 'layer_59', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  93. {'id': 'layer_60', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  94. ])
  95. #layer62 -> layer65, input = (512, 26, 26), flow_out = (1024, 13, 13)
  96. self.blocks['block62_65'] = DarknetBlock([
  97. {'id': 'layer_62', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},
  98. {'id': 'layer_63', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  99. {'id': 'layer_64', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  100. ])
  101. #layer66 -> layer74, input = (1024, 13, 13), flow_out = (1024, 13, 13)
  102. self.blocks['block66_68'] = DarknetBlock([
  103. {'id': 'layer_66', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  104. {'id': 'layer_67', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  105. ])
  106. self.blocks['block69_71'] = DarknetBlock([
  107. {'id': 'layer_69', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  108. {'id': 'layer_70', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  109. ])
  110. self.blocks['block72_74'] = DarknetBlock([
  111. {'id': 'layer_72', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  112. {'id': 'layer_73', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}
  113. ])
  114. #layer75 -> layer79, input = (1024, 13, 13), flow_out = (512, 13, 13)
  115. self.blocks['block75_79'] = DarknetBlock([
  116. {'id': 'layer_75', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  117. {'id': 'layer_76', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  118. {'id': 'layer_77', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  119. {'id': 'layer_78', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  120. {'id': 'layer_79', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}
  121. ], skip = False)
  122. #layer80 -> layer82, input = (512, 13, 13), yolo_out = (255, 13, 13)
  123. self.blocks['yolo_82'] = DarknetBlock([
  124. {'id': 'layer_80', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  125. {'id': 'layer_81', 'in_channels': 1024, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}
  126. ], skip = False)
  127. #layer83 -> layer86, input = (512, 13, 13), -> (256, 13, 13) -> upsample and concate layer61(512, 26, 26), flow_out = (768, 26, 26)
  128. self.blocks['block83_86'] = DarknetBlock([
  129. {'id': 'layer_84', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}
  130. ], skip = False)
  131. #layer87 -> layer91, input = (768, 26, 26), flow_out = (256, 26, 26)
  132. self.blocks['block87_91'] = DarknetBlock([
  133. {'id': 'layer_87', 'in_channels': 768, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  134. {'id': 'layer_88', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  135. {'id': 'layer_89', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  136. {'id': 'layer_90', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  137. {'id': 'layer_91', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}
  138. ], skip = False)
  139. #layer92 -> layer94, input = (256, 26, 26), yolo_out = (255, 26, 26)
  140. self.blocks['yolo_94'] = DarknetBlock([
  141. {'id': 'layer_92', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  142. {'id': 'layer_93', 'in_channels': 512, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}
  143. ], skip = False)
  144. #layer95 -> layer98, input = (256, 26, 26), -> (128, 26, 26) -> upsample and concate layer36(256, 52, 52), flow_out = (384, 52, 52)
  145. self.blocks['block95_98'] = DarknetBlock([
  146. {'id': 'layer_96', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}
  147. ], skip = False)
  148. #layer99 -> layer106, input = (384, 52, 52), yolo_out = (255, 52, 52)
  149. self.blocks['yolo_106'] = DarknetBlock([
  150. {'id': 'layer_99', 'in_channels': 384, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  151. {'id': 'layer_100', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  152. {'id': 'layer_101', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  153. {'id': 'layer_102', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  154. {'id': 'layer_103', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},
  155. {'id': 'layer_104', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},
  156. {'id': 'layer_105', 'in_channels': 256, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}
  157. ], skip = False)
  158. def forward(self, x):
  159. x = self.blocks['block0_4'](x)
  160. x = self.blocks['block5_8'](x)
  161. x = self.blocks['block9_11'](x)
  162. x = self.blocks['block12_15'](x)
  163. x = self.blocks['block16_18'](x)
  164. x = self.blocks['block19_21'](x)
  165. x = self.blocks['block22_24'](x)
  166. x = self.blocks['block25_27'](x)
  167. x = self.blocks['block28_30'](x)
  168. x = self.blocks['block31_33'](x)
  169. x = self.blocks['block34_36'](x)
  170. skip36 = x
  171. x = self.blocks['block37_40'](x)
  172. x = self.blocks['block41_43'](x)
  173. x = self.blocks['block44_46'](x)
  174. x = self.blocks['block47_49'](x)
  175. x = self.blocks['block50_52'](x)
  176. x = self.blocks['block53_55'](x)
  177. x = self.blocks['block56_58'](x)
  178. x = self.blocks['block59_61'](x)
  179. skip61 = x
  180. x = self.blocks['block62_65'](x)
  181. x = self.blocks['block66_68'](x)
  182. x = self.blocks['block69_71'](x)
  183. x = self.blocks['block72_74'](x)
  184. x = self.blocks['block75_79'](x)
  185. yolo_82 = self.blocks['yolo_82'](x)
  186. x = self.blocks['block83_86'](x)
  187. x = self.upsample(x)
  188. x = torch.cat((x, skip61), dim = 1)
  189. x = self.blocks['block87_91'](x)
  190. yolo_94 = self.blocks['yolo_94'](x)
  191. x = self.blocks['block95_98'](x)
  192. x = self.upsample(x)
  193. x = torch.cat((x, skip36), dim = 1)
  194. yolo_106 = self.blocks['yolo_106'](x)
  195. return yolo_82, yolo_94, yolo_106

定义模型

model = Yolov3()

到这一步可以用 print 将模型结构打印出来。

加载预训练权重

这时候,权重文件应该已经下载好了,我们可以通过一个权重读取类来将权重参数加载到我们的模型里:

  1. #权重读取类
  2. class WeightReader():
  3. def __init__(self, weight_file):
  4. with open(weight_file, 'r') as fp:
  5. header = np.fromfile(fp, dtype = np.int32, count = 5)
  6. self.header = torch.from_numpy(header)
  7. self.seen = self.header[3]
  8. #The rest of the values are the weights
  9. #load them up
  10. self.weights = np.fromfile(fp, dtype = np.float32)
  11. #加载权重参数
  12. def load_weights(self, model):
  13. ptr = 0
  14. for _, block in model.blocks.items():
  15. for _, layer in block.layers.items():
  16. bn = layer.bnorm
  17. conv = layer.conv
  18. if bn is not None:
  19. #Get the number of weights of Batch Norm Layer
  20. num_bn_biases = bn.bias.numel()
  21. #Load the data
  22. #偏差
  23. bn_biases = torch.from_numpy(self.weights[ptr:ptr + num_bn_biases])
  24. ptr += num_bn_biases
  25. #权重
  26. bn_weights = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])
  27. ptr += num_bn_biases
  28. #均值
  29. bn_running_mean = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])
  30. ptr += num_bn_biases
  31. #方差
  32. bn_running_var = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])
  33. ptr += num_bn_biases
  34. #Cast the loaded weights into dims of model weights.
  35. bn_biases = bn_biases.view_as(bn.bias.data)
  36. bn_weights = bn_weights.view_as(bn.weight.data)
  37. bn_running_mean = bn_running_mean.view_as(bn.running_mean)
  38. bn_running_var = bn_running_var.view_as(bn.running_var)
  39. #Copy the data to model
  40. bn.bias.data.copy_(bn_biases)
  41. bn.weight.data.copy_(bn_weights)
  42. bn.running_mean.copy_(bn_running_mean)
  43. bn.running_var.copy_(bn_running_var)
  44. else:
  45. #Number of biases
  46. num_biases = conv.bias.numel()
  47. #Load the biases
  48. conv_biases = torch.from_numpy(self.weights[ptr: ptr + num_biases])
  49. ptr = ptr + num_biases
  50. #reshape the loaded weights according to the dims of the model weights
  51. conv_biases = conv_biases.view_as(conv.bias.data)
  52. #Finally copy the data
  53. conv.bias.data.copy_(conv_biases)
  54. #load the weights for the Convolutional layers
  55. num_weights = conv.weight.numel()
  56. #Do the same as above for weights
  57. conv_weights = torch.from_numpy(self.weights[ptr:ptr+num_weights])
  58. ptr = ptr + num_weights
  59. conv_weights = conv_weights.view_as(conv.weight.data)
  60. conv.weight.data.copy_(conv_weights)
  61. #查看网络参数
  62. def weight_summary(self, model):
  63. train_able, train_disable = 0, 0
  64. for _, block in model.blocks.items():
  65. for _, layer in block.layers.items():
  66. bn = layer.bnorm
  67. conv = layer.conv
  68. if bn is not None:
  69. train_able += (bn.bias.numel() + bn.weight.numel())
  70. train_disable += (bn.running_mean.numel() + bn.running_var.numel())
  71. else:
  72. train_able += conv.bias.numel()
  73. train_able += conv.weight.numel()
  74. print("total = %d"%(train_able + train_disable))
  75. print("count of train_able = %d"%train_able)
  76. print("count of train_disable = %d"%train_disable)

官方给出的预训练权重文件中去掉前 5 个数值,剩下的才是可以加载到模型里面的。需要注意权重文件中参数的保存格式,这里给出官方提供的一张图:

3f20c8149ed1dbe20ace33ce378b80c8.png

它是按照层的前向传播顺序来存储参数数值的。如果 DarkNet 层中有 BN 层,则依次存储 BN 的偏置,权重,均值,方差以及卷积层的权重。如果 DarkNet 层中没有 BN 层,则依次存储卷积层的偏置,卷积层的权重。

对于 BN 层,它的偏置和权重是可训练参数,而均值和方差是不可训练参数,但都需要加载到网络里。

通过以下代码加载参数并查看参数数量。

  1. #加载模型参数,并查看模型参数数量
  2. #####网络总参数为 62,001,757
  3. #####其中,可训练参数(BN层以及卷积层的weight, bias) = 61,949,149, 不可训练参数(BN层的均值和方差) = 52,608
  4. weight_reader = WeightReader('yolov3.weights')
  5. weight_reader.load_weights(model)
  6. weight_reader.weight_summary(model)

输入处理

定义一个图片加载的函数,将输入的图片裁剪成网络输入的大小(416),并将图片每个像素都除以 255,转成四维张量。最后返回图片和图片原始的宽高。

  1. #加载图片
  2. def img_loader(photo_file, input_w, input_h):
  3. img = Image.open(photo_file)
  4. img_w, img_h = img.size
  5. img = img.resize((input_w, input_h))
  6. img = torchvision.transforms.ToTensor()(img)
  7. img = torch.unsqueeze(img, 0)
  8. #返回指定大小的图片张量和图片原始的宽高
  9. return img, img_w, img_h

接下来,模型就可以根据输入的图片得到准确的输出了。

  1. photo_file = 'zebra.jpg'
  2. input_w, input_h = 416, 416
  3. img, img_w, img_h = img_loader(photo_file, input_w, input_h)
  4. y_hat = model(img)

这时候得到的 y_hat 是一个含有三个元素的元组,每个元素都是一个四维张量,剩下要做的事就是对这些张量进行解码,做 IoU 过滤,使用非极大值抑制,画出边框等一些列操作,这里一并将涉及到的函数直接贴出。

  1. #锚箱类
  2. class BoundBox:
  3. def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
  4. self.xmin = xmin
  5. self.ymin = ymin
  6. self.xmax = xmax
  7. self.ymax = ymax
  8. self.objness = objness
  9. self.classes = classes
  10. self.label = -1
  11. self.score = -1
  12. def get_label(self):
  13. if self.label == -1:
  14. self.label = np.argmax(self.classes)
  15. return self.label
  16. def get_score(self):
  17. if self.score == -1:
  18. self.score = self.classes[self.get_label()]
  19. return self.score
  20. def _sigmoid(x):
  21. return 1. / (1. + np.exp(-x))
  22. #解码网络输出
  23. def decode_netout(netout, anchors, obj_thresh, net_w, net_h):
  24. grid_h, grid_w = netout.shape[1: ]
  25. nb_box = 3
  26. netout = netout.permute(1, 2, 0).detach().numpy().reshape((grid_h, grid_w, nb_box, -1))
  27. nb_class = netout.shape[-1] - 5
  28. boxes = []
  29. netout[..., :2] = _sigmoid(netout[..., :2])
  30. netout[..., 4:] = _sigmoid(netout[..., 4:])
  31. netout[..., 5:] = netout[..., 4][..., np.newaxis] * netout[..., 5:]
  32. netout[..., 5:] *= netout[..., 5:] > obj_thresh
  33. for i in range(grid_h*grid_w):
  34. row = i / grid_w
  35. col = i % grid_w
  36. for b in range(nb_box):
  37. # 4th element is objectness score
  38. objectness = netout[int(row)][int(col)][b][4]
  39. if(objectness.all() <= obj_thresh): continue
  40. # first 4 elements are x, y, w, and h
  41. x, y, w, h = netout[int(row)][int(col)][b][:4]
  42. x = (col + x) / grid_w # center position, unit: image width
  43. y = (row + y) / grid_h # center position, unit: image height
  44. w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
  45. h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
  46. # last elements are class probabilities
  47. classes = netout[int(row)][col][b][5:]
  48. box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
  49. boxes.append(box)
  50. return boxes
  51. #执行边界框坐标的转换,将边界框列表、加载照片的原始形状和网络输入的形状作为参数。
  52. #边界框的坐标将直接更新。
  53. def correct_yolo_boxes(boxes, image_w, image_h, net_w, net_h):
  54. new_w, new_h = net_w, net_h
  55. for i in range(len(boxes)):
  56. x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
  57. y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
  58. boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
  59. boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
  60. boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
  61. boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)
  62. # 为计算 IoU 服务
  63. def _interval_overlap(interval_a, interval_b):
  64. x1, x2 = interval_a
  65. x3, x4 = interval_b
  66. if x3 < x1:
  67. if x4 < x1:
  68. return 0
  69. else:
  70. return min(x2,x4) - x1
  71. else:
  72. if x2 < x3:
  73. return 0
  74. else:
  75. return min(x2,x4) - x3
  76. #计算两个箱体的 IoU
  77. def bbox_iou(box1, box2):
  78. intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
  79. intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
  80. intersect = intersect_w * intersect_h
  81. w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
  82. w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
  83. union = w1*h1 + w2*h2 - intersect
  84. return float(intersect) / union
  85. #非极大值抑制
  86. def do_nms(boxes, nms_thresh):
  87. if len(boxes) > 0:
  88. nb_class = len(boxes[0].classes)
  89. else:
  90. return
  91. for c in range(nb_class):
  92. sorted_indices = np.argsort([-box.classes[c] for box in boxes])
  93. for i in range(len(sorted_indices)):
  94. index_i = sorted_indices[i]
  95. if boxes[index_i].classes[c] == 0: continue
  96. for j in range(i+1, len(sorted_indices)):
  97. index_j = sorted_indices[j]
  98. if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
  99. boxes[index_j].classes[c] = 0
  100. #检索那些能强烈预测物体存在的箱子:它们的可信度超过 thresh
  101. def get_boxes(boxes, labels, thresh):
  102. v_boxes, v_labels, v_scores = list(), list(), list()
  103. # enumerate all boxes
  104. for box in boxes:
  105. # enumerate all possible labels
  106. for i in range(len(labels)):
  107. # check if the threshold for this label is high enough
  108. if box.classes[i] > thresh:
  109. v_boxes.append(box)
  110. v_labels.append(labels[i])
  111. v_scores.append(box.classes[i]*100)
  112. # don't break, many labels may trigger for one box
  113. return v_boxes, v_labels, v_scores
  114. #画出边界框
  115. def draw_boxes(photo_file, v_boxes, v_labels, v_scores):
  116. # load the image
  117. data = plt.imread(photo_file)
  118. # plot the image
  119. plt.imshow(data)
  120. # get the context for drawing boxes
  121. ax = plt.gca()
  122. # plot each box
  123. for i in range(len(v_boxes)):
  124. box = v_boxes[i]
  125. # get coordinates
  126. y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
  127. # calculate width and height of the box
  128. width, height = x2 - x1, y2 - y1
  129. # create the shape
  130. rect = plt.Rectangle((x1, y1), width, height, fill=False, color='white')
  131. # draw the box
  132. ax.add_patch(rect)
  133. # draw text and score in top left corner
  134. label = "%s (%.1f)" % (v_labels[i], v_scores[i])
  135. plt.text(x1, y1, label, color='white', bbox=dict(facecolor='red'))
  136. # show the plot
  137. plt.show()

写一个函数对上述步骤做一个封装。

  1. def make_predict(photo_file):
  2. img, img_w, img_h = img_loader(photo_file, input_w, input_h)
  3. y_hat = model(img)
  4. boxes = []
  5. for i in range(len(y_hat)):
  6. # decode the output of the network
  7. boxes += decode_netout(y_hat[i][0], anchors[i], class_threshold, input_w, input_h)
  8. # correct the sizes of the bounding boxes for the shape of the image
  9. correct_yolo_boxes(boxes, img_w, img_h, input_w, input_h)
  10. # suppress non-maximal boxes
  11. do_nms(boxes, 0.5)
  12. # get the details of the detected objects
  13. v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
  14. # summarize what we found
  15. for i in range(len(v_boxes)):
  16. print(v_labels[i], v_scores[i])
  17. # draw what we found
  18. draw_boxes(photo_file, v_boxes, v_labels, v_scores)

另外,需要将网络输出的类别序号映射成我们能够理解的自然语言,权重文件能够预测的标签如下:

  1. #权重文件能够预测的标签
  2. labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
  3. "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
  4. "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
  5. "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
  6. "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
  7. "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
  8. "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
  9. "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
  10. "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
  11. "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

最后,执行我们封装好的函数。

  1. #预先设定的锚点
  2. anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
  3. #输入的网络的宽高
  4. input_w, input_h = 416, 416
  5. #置信度阈值
  6. class_threshold = 0.75
  7. #读取图片开始预测
  8. photo_file = 'zebra.jpg'
  9. make_predict(photo_file)

结果如下:

7e862bddd256ac8fcf527700bf0bcad9.png

参考文献:

YOLOv3 论文

How to Perform Object Detection With YOLOv3 in Keras

How to implement a YOLO (v3) object detector from scratch in PyTorch

YOLOv3网络结构和解析

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/727973
推荐阅读
相关标签
  

闽ICP备14008679号