当前位置:   article > 正文

【YOLO改进】换遍IoU损失函数之DIoU Loss(基于MMYOLO)

【YOLO改进】换遍IoU损失函数之DIoU Loss(基于MMYOLO)

DIoU损失函数

论文链接:https://arxiv.org/pdf/1911.08287

DIoU损失函数(Distance Intersection over Union Loss)是一种在目标检测任务中常用的损失函数,用于优化边界框的位置。这种损失函数是IoU损失函数的改进版,其不仅考虑了边界框之间的重叠区域,还考虑了它们中心点之间的距离,从而提供更加精确的位置优化。以下是DIoU损失函数的设计原理和计算步骤的详细介绍:

设计原理

一、IoU的局限性

  • IoU(Intersection over Union)损失函数主要基于预测框和真实框之间的交并比,这个比例值越大表示预测框越接近真实框。
  • 但IoU损失函数在预测框和真实框没有重叠时无法提供有效的梯度信息,这限制了模型的学习效率。

二、DIoU的引入

  • DIoU损失在IoU的基础上增加了中心点距离的考量,这使得即使在两个框不重叠的情况下也能有效地进行梯度下降。
  • 通过考虑框的几何中心距离,DIoU损失有助于减少边界框的尺寸误差,并加速收敛。

计算步骤

一、计算IoU

  • 计算两个边界框A和B的交集面积I。
  • 计算两个边界框的并集面积U。
  • IoU计算公式为:\text{IoU} = \frac{I}{U}

二、计算框的中心距离

  • 设预测框的中心为(x_p,y_p),真实框的中心为 (x_g,y_g)
  • 中心点距离 d 的计算公式为:d = \sqrt{(x_{p} - x_{g})^2 + (y_{p} - y_{g})^2}

三、计算归一化中心距离

  • 计算包围预测框和真实框的最小闭合矩形(称为最小闭合框),并求出其对角线长度。
  • 归一化中心距离为 \frac{d}{c}​,这样可以确保距离的比例适应不同大小的边界框。

四、计算DIoU

  • DIoU损失函数定义为:\text{DIoU Loss} = 1 - \text{IoU} + \frac{d^2}{c^2}
  • 其中,\frac{d^2}{c^2}​ 表示中心点距离的归一化平方,这样确保了距离项在损失函数中占有合适的权重。

使用PyTorch实现DIoU计算的源代码

  1. import torch
  2. def diou_loss(pred_boxes, gt_boxes):
  3. """
  4. 计算 DIoU 损失。
  5. :param pred_boxes: 预测的边界框,形状为 (batch_size, 4),格式为 (x1, y1, x2, y2)
  6. :param gt_boxes: 真实的边界框,形状为 (batch_size, 4),格式为 (x1, y1, x2, y2)
  7. :return: DIoU 损失值
  8. """
  9. # 计算交集的坐标
  10. inter_x1 = torch.max(pred_boxes[:, 0], gt_boxes[:, 0])
  11. inter_y1 = torch.max(pred_boxes[:, 1], gt_boxes[:, 1])
  12. inter_x2 = torch.min(pred_boxes[:, 2], gt_boxes[:, 2])
  13. inter_y2 = torch.min(pred_boxes[:, 3], gt_boxes[:, 3])
  14. # 计算交集的面积
  15. inter_area = torch.clamp(inter_x2 - inter_x1, min=0) * torch.clamp(inter_y2 - inter_y1, min=0)
  16. # 计算预测框和真实框的面积
  17. pred_area = (pred_boxes[:, 2] - pred_boxes[:, 0]) * (pred_boxes[:, 3] - pred_boxes[:, 1])
  18. gt_area = (gt_boxes[:, 2] - gt_boxes[:, 0]) * (gt_boxes[:, 3] - gt_boxes[:, 1])
  19. # 计算并集的面积
  20. union_area = pred_area + gt_area - inter_area
  21. # 计算IoU
  22. iou = inter_area / union_area
  23. # 计算中心点的坐标
  24. pred_center_x = (pred_boxes[:, 0] + pred_boxes[:, 2]) / 2
  25. pred_center_y = (pred_boxes[:, 1] + pred_boxes[:, 3]) / 2
  26. gt_center_x = (gt_boxes[:, 0] + gt_boxes[:, 2]) / 2
  27. gt_center_y = (gt_boxes[:, 1] + gt_boxes[:, 3]) / 2
  28. # 计算中心点距离的平方
  29. center_distance = (pred_center_x - gt_center_x) ** 2 + (pred_center_y - gt_center_y) ** 2
  30. # 计算包络框的对角线距离的平方
  31. enclose_x1 = torch.min(pred_boxes[:, 0], gt_boxes[:, 0])
  32. enclose_y1 = torch.min(pred_boxes[:, 1], gt_boxes[:, 1])
  33. enclose_x2 = torch.max(pred_boxes[:, 2], gt_boxes[:, 2])
  34. enclose_y2 = torch.max(pred_boxes[:, 3], gt_boxes[:, 3])
  35. enclose_diagonal = (enclose_x2 - enclose_x1) ** 2 + (enclose_y2 - enclose_y1) ** 2
  36. # 计算 DIoU
  37. diou = iou - (center_distance / enclose_diagonal)
  38. # DIoU 损失
  39. diou_loss = 1 - diou
  40. return diou_loss
  41. # 示例
  42. pred_boxes = torch.tensor([[50, 50, 90, 100], [70, 80, 120, 150]])
  43. gt_boxes = torch.tensor([[60, 60, 100, 120], [80, 90, 130, 160]])
  44. loss = diou_loss(pred_boxes, gt_boxes)
  45. print(loss)

 替换DIoU损失函数(基于MMYOLO)

由于MMYOLO中没有实现DIoU损失函数,所以需要在mmyolo/models/iou_loss.py中添加DIoU的计算和对应的iou_mode,修改完以后在终端运行

python setup.py install

再在配置文件中进行修改即可。修改例子如下:

  1. elif iou_mode == 'diou':
  2. # CIoU = IoU - ( (ρ^2(b_pred,b_gt) / c^2) + (alpha x v) )
  3. # calculate enclose area (c^2)
  4. enclose_area = enclose_w**2 + enclose_h**2 + eps
  5. # calculate ρ^2(b_pred,b_gt):
  6. # euclidean distance between b_pred(bbox2) and b_gt(bbox1)
  7. # center point, because bbox format is xyxy -> left-top xy and
  8. # right-bottom xy, so need to / 4 to get center point.
  9. rho2_left_item = ((bbox2_x1 + bbox2_x2) - (bbox1_x1 + bbox1_x2))**2 / 4
  10. rho2_right_item = ((bbox2_y1 + bbox2_y2) -
  11. (bbox1_y1 + bbox1_y2))**2 / 4
  12. rho2 = rho2_left_item + rho2_right_item # rho^2 (ρ^2)
  13. ious = ious - ((rho2 / enclose_area))

修改后的配置文件(以configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py为例)

  1. _base_ = ['../_base_/default_runtime.py', '../_base_/det_p5_tta.py']
  2. # ========================Frequently modified parameters======================
  3. # -----data related-----
  4. data_root = 'data/coco/' # Root path of data
  5. # Path of train annotation file
  6. train_ann_file = 'annotations/instances_train2017.json'
  7. train_data_prefix = 'train2017/' # Prefix of train image path
  8. # Path of val annotation file
  9. val_ann_file = 'annotations/instances_val2017.json'
  10. val_data_prefix = 'val2017/' # Prefix of val image path
  11. num_classes = 80 # Number of classes for classification
  12. # Batch size of a single GPU during training
  13. train_batch_size_per_gpu = 16
  14. # Worker to pre-fetch data for each single GPU during training
  15. train_num_workers = 8
  16. # persistent_workers must be False if num_workers is 0
  17. persistent_workers = True
  18. # -----model related-----
  19. # Basic size of multi-scale prior box
  20. anchors = [
  21. [(10, 13), (16, 30), (33, 23)], # P3/8
  22. [(30, 61), (62, 45), (59, 119)], # P4/16
  23. [(116, 90), (156, 198), (373, 326)] # P5/32
  24. ]
  25. # -----train val related-----
  26. # Base learning rate for optim_wrapper. Corresponding to 8xb16=128 bs
  27. base_lr = 0.01
  28. max_epochs = 300 # Maximum training epochs
  29. model_test_cfg = dict(
  30. # The config of multi-label for multi-class prediction.
  31. multi_label=True,
  32. # The number of boxes before NMS
  33. nms_pre=30000,
  34. score_thr=0.001, # Threshold to filter out boxes.
  35. nms=dict(type='nms', iou_threshold=0.65), # NMS type and threshold
  36. max_per_img=300) # Max number of detections of each image
  37. # ========================Possible modified parameters========================
  38. # -----data related-----
  39. img_scale = (640, 640) # width, height
  40. # Dataset type, this will be used to define the dataset
  41. dataset_type = 'YOLOv5CocoDataset'
  42. # Batch size of a single GPU during validation
  43. val_batch_size_per_gpu = 1
  44. # Worker to pre-fetch data for each single GPU during validation
  45. val_num_workers = 2
  46. # Config of batch shapes. Only on val.
  47. # It means not used if batch_shapes_cfg is None.
  48. batch_shapes_cfg = dict(
  49. type='BatchShapePolicy',
  50. batch_size=val_batch_size_per_gpu,
  51. img_size=img_scale[0],
  52. # The image scale of padding should be divided by pad_size_divisor
  53. size_divisor=32,
  54. # Additional paddings for pixel scale
  55. extra_pad_ratio=0.5)
  56. # -----model related-----
  57. # The scaling factor that controls the depth of the network structure
  58. deepen_factor = 0.33
  59. # The scaling factor that controls the width of the network structure
  60. widen_factor = 0.5
  61. # Strides of multi-scale prior box
  62. strides = [8, 16, 32]
  63. num_det_layers = 3 # The number of model output scales
  64. norm_cfg = dict(type='BN', momentum=0.03, eps=0.001) # Normalization config
  65. # -----train val related-----
  66. affine_scale = 0.5 # YOLOv5RandomAffine scaling ratio
  67. loss_cls_weight = 0.5
  68. loss_bbox_weight = 0.05
  69. loss_obj_weight = 1.0
  70. prior_match_thr = 4. # Priori box matching threshold
  71. # The obj loss weights of the three output layers
  72. obj_level_weights = [4., 1., 0.4]
  73. lr_factor = 0.01 # Learning rate scaling factor
  74. weight_decay = 0.0005
  75. # Save model checkpoint and validation intervals
  76. save_checkpoint_intervals = 10
  77. # The maximum checkpoints to keep.
  78. max_keep_ckpts = 3
  79. # Single-scale training is recommended to
  80. # be turned on, which can speed up training.
  81. env_cfg = dict(cudnn_benchmark=True)
  82. # ===============================Unmodified in most cases====================
  83. model = dict(
  84. type='YOLODetector',
  85. data_preprocessor=dict(
  86. type='mmdet.DetDataPreprocessor',
  87. mean=[0., 0., 0.],
  88. std=[255., 255., 255.],
  89. bgr_to_rgb=True),
  90. backbone=dict(
  91. ##使用YOLOv8的主干网络
  92. type='YOLOv8CSPDarknet',
  93. deepen_factor=deepen_factor,
  94. widen_factor=widen_factor,
  95. norm_cfg=norm_cfg,
  96. act_cfg=dict(type='SiLU', inplace=True)
  97. ),
  98. neck=dict(
  99. type='YOLOv5PAFPN',
  100. deepen_factor=deepen_factor,
  101. widen_factor=widen_factor,
  102. in_channels=[256, 512, 1024],
  103. out_channels=[256, 512, 1024],
  104. num_csp_blocks=3,
  105. norm_cfg=norm_cfg,
  106. act_cfg=dict(type='SiLU', inplace=True)),
  107. bbox_head=dict(
  108. type='YOLOv5Head',
  109. head_module=dict(
  110. type='YOLOv5HeadModule',
  111. num_classes=num_classes,
  112. in_channels=[256, 512, 1024],
  113. widen_factor=widen_factor,
  114. featmap_strides=strides,
  115. num_base_priors=3),
  116. prior_generator=dict(
  117. type='mmdet.YOLOAnchorGenerator',
  118. base_sizes=anchors,
  119. strides=strides),
  120. # scaled based on number of detection layers
  121. loss_cls=dict(
  122. type='mmdet.CrossEntropyLoss',
  123. use_sigmoid=True,
  124. reduction='mean',
  125. loss_weight=loss_cls_weight *
  126. (num_classes / 80 * 3 / num_det_layers)),
  127. # 修改此处实现IoU损失函数的替换
  128. loss_bbox=dict(
  129. type='IoULoss',
  130. iou_mode='diou',
  131. bbox_format='xywh',
  132. eps=1e-7,
  133. reduction='mean',
  134. loss_weight=loss_bbox_weight * (3 / num_det_layers),
  135. return_iou=True),
  136. loss_obj=dict(
  137. type='mmdet.CrossEntropyLoss',
  138. use_sigmoid=True,
  139. reduction='mean',
  140. loss_weight=loss_obj_weight *
  141. ((img_scale[0] / 640)**2 * 3 / num_det_layers)),
  142. prior_match_thr=prior_match_thr,
  143. obj_level_weights=obj_level_weights),
  144. test_cfg=model_test_cfg)
  145. albu_train_transforms = [
  146. dict(type='Blur', p=0.01),
  147. dict(type='MedianBlur', p=0.01),
  148. dict(type='ToGray', p=0.01),
  149. dict(type='CLAHE', p=0.01)
  150. ]
  151. pre_transform = [
  152. dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
  153. dict(type='LoadAnnotations', with_bbox=True)
  154. ]
  155. train_pipeline = [
  156. *pre_transform,
  157. dict(
  158. type='Mosaic',
  159. img_scale=img_scale,
  160. pad_val=114.0,
  161. pre_transform=pre_transform),
  162. dict(
  163. type='YOLOv5RandomAffine',
  164. max_rotate_degree=0.0,
  165. max_shear_degree=0.0,
  166. scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
  167. # img_scale is (width, height)
  168. border=(-img_scale[0] // 2, -img_scale[1] // 2),
  169. border_val=(114, 114, 114)),
  170. dict(
  171. type='mmdet.Albu',
  172. transforms=albu_train_transforms,
  173. bbox_params=dict(
  174. type='BboxParams',
  175. format='pascal_voc',
  176. label_fields=['gt_bboxes_labels', 'gt_ignore_flags']),
  177. keymap={
  178. 'img': 'image',
  179. 'gt_bboxes': 'bboxes'
  180. }),
  181. dict(type='YOLOv5HSVRandomAug'),
  182. dict(type='mmdet.RandomFlip', prob=0.5),
  183. dict(
  184. type='mmdet.PackDetInputs',
  185. meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
  186. 'flip_direction'))
  187. ]
  188. train_dataloader = dict(
  189. batch_size=train_batch_size_per_gpu,
  190. num_workers=train_num_workers,
  191. persistent_workers=persistent_workers,
  192. pin_memory=True,
  193. sampler=dict(type='DefaultSampler', shuffle=True),
  194. dataset=dict(
  195. type=dataset_type,
  196. data_root=data_root,
  197. ann_file=train_ann_file,
  198. data_prefix=dict(img=train_data_prefix),
  199. filter_cfg=dict(filter_empty_gt=False, min_size=32),
  200. pipeline=train_pipeline))
  201. test_pipeline = [
  202. dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
  203. dict(type='YOLOv5KeepRatioResize', scale=img_scale),
  204. dict(
  205. type='LetterResize',
  206. scale=img_scale,
  207. allow_scale_up=False,
  208. pad_val=dict(img=114)),
  209. dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
  210. dict(
  211. type='mmdet.PackDetInputs',
  212. meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
  213. 'scale_factor', 'pad_param'))
  214. ]
  215. val_dataloader = dict(
  216. batch_size=val_batch_size_per_gpu,
  217. num_workers=val_num_workers,
  218. persistent_workers=persistent_workers,
  219. pin_memory=True,
  220. drop_last=False,
  221. sampler=dict(type='DefaultSampler', shuffle=False),
  222. dataset=dict(
  223. type=dataset_type,
  224. data_root=data_root,
  225. test_mode=True,
  226. data_prefix=dict(img=val_data_prefix),
  227. ann_file=val_ann_file,
  228. pipeline=test_pipeline,
  229. batch_shapes_cfg=batch_shapes_cfg))
  230. test_dataloader = val_dataloader
  231. param_scheduler = None
  232. optim_wrapper = dict(
  233. type='OptimWrapper',
  234. optimizer=dict(
  235. type='SGD',
  236. lr=base_lr,
  237. momentum=0.937,
  238. weight_decay=weight_decay,
  239. nesterov=True,
  240. batch_size_per_gpu=train_batch_size_per_gpu),
  241. constructor='YOLOv5OptimizerConstructor')
  242. default_hooks = dict(
  243. param_scheduler=dict(
  244. type='YOLOv5ParamSchedulerHook',
  245. scheduler_type='linear',
  246. lr_factor=lr_factor,
  247. max_epochs=max_epochs),
  248. checkpoint=dict(
  249. type='CheckpointHook',
  250. interval=save_checkpoint_intervals,
  251. save_best='auto',
  252. max_keep_ckpts=max_keep_ckpts))
  253. custom_hooks = [
  254. dict(
  255. type='EMAHook',
  256. ema_type='ExpMomentumEMA',
  257. momentum=0.0001,
  258. update_buffers=True,
  259. strict_load=False,
  260. priority=49)
  261. ]
  262. val_evaluator = dict(
  263. type='mmdet.CocoMetric',
  264. proposal_nums=(100, 1, 10),
  265. ann_file=data_root + val_ann_file,
  266. metric='bbox')
  267. test_evaluator = val_evaluator
  268. train_cfg = dict(
  269. type='EpochBasedTrainLoop',
  270. max_epochs=max_epochs,
  271. val_interval=save_checkpoint_intervals)
  272. val_cfg = dict(type='ValLoop')
  273. test_cfg = dict(type='TestLoop')

        

         

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/553957
推荐阅读
相关标签
  

闽ICP备14008679号