当前位置:   article > 正文

深度学习05 - 一阶全卷积目标检测实例(FCOS)_fcos: a simple and strong anchor-free object detec

fcos: a simple and strong anchor-free object detector

一阶全卷积目标检测实例:

Fully-Convolutional One-Stage Object Detection

参考:EECS 498-007/598-005 Assignment 4-1: One-Stage Object Detector

论文地址:

 FCOS: Fully Convolutional One-Stage Object Detection

FCOS: A Simple and Strong Anchor-free Object Detector

FCOS概述:

不同于使用Anchor去确定目标,FCOS是使用全卷积网络通过确定特征图上的每个位置来预测该点距离目标的距离(左上右下 LTRB),如下图:

流程:

该流程与第一版的网络的不同点在于最后将regression 和centre-ness分支放在一起。该文中使用的是第二版的网络。

FCOS: Fully Convolutional One-Stage Object Detection​​​​​​
FCOS: A Simple and Strong Anchor-free Object Detector

1. 实现backbone 和 Feature Pyramid 网络 

1. Backbone:

用RegNetX-400MF网络作为backbone

在backbone中,使用RegNetX-400MF作为网络,在卷积层后输出特征C3,C4,C5。得到特征C3,C4,C5后通过1x1的卷积层得到减少通道尺寸后的特征,然后从上往下进行融合。在融合的时候,因为特征的大小不一样,所以将上一个特征进行两倍的上采样,通过Nearest Neighbor上采样。再将每层的特征再进行3x3,pooling为1的卷积得到新的特征图。

其中1x1的卷积的实现:

题目中要求输出相同

  1. # Add THREE lateral 1x1 conv and THREE output 3x3 conv layers.
  2. self.fpn_params = nn.ModuleDict()
  3. # Replace "pass" statement with your code
  4. self.fpn_params['l3'] = nn.Conv2d(dummy_out_shapes[0][1][1], self.out_channels, 1, 1, 0)
  5. self.fpn_params['l4'] = nn.Conv2d(dummy_out_shapes[1][1][1], self.out_channels, 1, 1, 0)
  6. self.fpn_params['l5'] = nn.Conv2d(dummy_out_shapes[2][1][1], self.out_channels, 1, 1, 0)
  1. lat3 = self.fpn_params['l3'](backbone_feats['c3']) # [2, 64, 28, 28]
  2. lat4 = self.fpn_params['l4'](backbone_feats['c4']) # [2, 64, 14, 14]
  3. lat5 = self.fpn_params['l5'](backbone_feats['c5']) # [2, 64, 7, 7]

ModuleDict类:

用字典存放卷积层和网络

Holds submodules in a dictionary.

ModuleDict can be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by all Module methods.

ModuleDict is an ordered dictionary that respects

  • the order of insertion, and

  • in update(), the order of the merged OrderedDictdict (started from Python 3.6) or another ModuleDict (the argument to update()).

Note that update() with other unordered mapping types (e.g., Python’s plain dict before Python version 3.6) does not preserve the order of the merged mapping.

Parameters

modules (iterable, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module)

2.FPN网络

关于FPN网络,论文中的解释如图:

输入的图像通过卷积后,得到C3,C4,C5特征图,然后将特征图进行一个1x1的卷积,得到新的特征图,将新的特征图用2倍的缩放,使用nearest neighbor上采样。将上层的特征图与下层的特征图进行融合。

关于为什么是2倍:因为在图像进行卷积的时候,stide是8,16,32每次都是上一次的2倍。

通过使用F.interpolate将l5和l4的大小放大到跟l4和l3一样。

  1. lat5_resize = nn.functional.interpolate(lat5, size=(lat4.shape[2], lat4.shape[3]), mode='nearest') # (14, 14)
  2. # merge after convolution lat4 with lat5_resize (2x upsample)
  3. lat4 = lat4 + lat5_resize
  4. lat4_resize = nn.functional.interpolate(lat4, size=(lat3.shape[2], lat3.shape[3]), mode='nearest')
  5. # merge lat4_resize with lat3
  6. lat3 = lat3 + lat4_resize

将新的特征图,再进行3x3的卷积,其中padding是1,stride是1。得到proposal 特征图P3,P4,P5。

  1. self.fpn_params['p3'] = nn.Conv2d(self.out_channels, self.out_channels, 3, 1, 1)
  2. self.fpn_params['p4'] = nn.Conv2d(self.out_channels, self.out_channels, 3, 1, 1)
  3. self.fpn_params['p5'] = nn.Conv2d(self.out_channels, self.out_channels, 3, 1, 1)
  1. fpn_feats['p3'] = self.fpn_params['p3'](lat3)
  2. fpn_feats['p4'] = self.fpn_params['p4'](lat4)
  3. fpn_feats['p5'] = self.fpn_params['p5'](lat5)

视频1.1.2 FPN结构详解也提供了更详细的结构可供参考。

2.实现预测网络(Head)

通过实现Head来预测分类,bounding box和center-ness

在预测网络中,拿到前项输出的特征图后,喂给卷积层。

拿classification这一分支来讲,特征图输入给3x3的卷积层然后在给激活函数ReLU,然后再输入给相同的3x3的卷积层+ReLU,最后再进行一次3x3的卷积,得到classification。此时classification的shape跟输入的特征图的shape不一样。输入特征图的shape为(B, C, H, W), 而classification的shape为(B, H*W, C)。同理centre-ness和bounding box的shape为(B, H*W, 1)和(B, H*W, 4) 此处的4 是bounding box的四个点。

1. 搭建3x3的卷积层,其中权重的均值为0,标准差为0.01, 偏差为0

循环的使用将输出两层卷积层

对应图中这部分

  1. inputSize = in_channels
  2. for out_channel in stem_channels:
  3. print('out_channel', out_channel)
  4. # class
  5. conv1 = nn.Conv2d(inputSize, out_channel, 3, 1, 1)
  6. relu1 = nn.ReLU()
  7. # init weight
  8. torch.nn.init.normal_(conv1.weight, 0, 0.01)
  9. # init bias
  10. # torch.nn.init.constant_(conv1.bias, 0)
  11. torch.nn.init.zeros_(conv1.bias)
  12. # append conv1 and relu1 to stem_cls
  13. stem_cls.append(conv1)
  14. stem_cls.append(relu1)
  15. # box
  16. conv2 = nn.Conv2d(inputSize, out_channel, 3, 1, 1)
  17. relu2 = nn.ReLU()
  18. # init weight
  19. torch.nn.init.normal_(conv2.weight, 0, 0.01)
  20. # init bias
  21. # torch.nn.init.constant_(conv1.bias, 0)
  22. torch.nn.init.zeros_(conv2.bias)
  23. # append conv1 and relu1 to stem_cls
  24. stem_box.append(conv2)
  25. stem_box.append(relu2)
  26. inputSize = out_channel
 

2. 为三个不同的预测值搭建一个3x3的卷积层

  1. self.pred_cls = nn.Conv2d(stem_channels[-1], num_classes, 3, 1, 1)
  2. torch.nn.init.normal_(self.pred_cls.weight, 0, 0.01)
  3. torch.nn.init.zeros_(self.pred_cls.bias)
  4. self.pred_box = nn.Conv2d(stem_channels[-1], 4, 3, 1, 1)
  5. torch.nn.init.normal_(self.pred_box.weight, 0, 0.01)
  6. torch.nn.init.zeros_(self.pred_box.bias)
  7. self.pred_ctr = nn.Conv2d(stem_channels[-1], 1, 3, 1, 1)
  8. torch.nn.init.normal_(self.pred_ctr.weight, 0, 0.01)
  9. torch.nn.init.zeros_(self.pred_ctr.bias)

3. forward

组装一下之前的卷积层,得到三个预测值

feats_per_fpn_level 是前面FPN网络传递过来的特征{p3,p4,p5},每个特征的shape是(batch_size, fpn_channels, H, W)。最终我们要得到这三个预测值的shape分别为:
        1. Classification logits: `(batch_size, H * W, num_classes)`.
        2. Box regression deltas: `(batch_size, H * W, 4)`
        3. Centerness logits:     `(batch_size, H * W, 1)`

流程:特征->stem 层-> pred层;

(⚠️:需reshape到目标值的shape,使用view()permute()

  1. for level, feature in feats_per_fpn_level.items():
  2. # append each level's feature into dict
  3. class_logits[level] = self.pred_cls(self.stem_cls(feature))
  4. # print('========1', class_logits[level].shape) # ([2, 20, 14, 14])
  5. # reshape Classification logits: `(batch_size, H * W, num_classes)`.
  6. batch_size = class_logits[level].shape[0]
  7. num_cls = class_logits[level].shape[1]
  8. # reshape [2, 20, 14, 14] to [2, 14*14, 20]
  9. # -1 => 2*20*14*14/2*20
  10. # permute into desire order
  11. class_logits[level] = class_logits[level].view(batch_size, num_cls, -1).permute(0, 2, 1)
  12. boxreg_deltas[level] = self.pred_box(self.stem_box(feature))
  13. # reshape Box regression deltas: `(batch_size, H * W, 4)`
  14. boxreg_deltas[level] = boxreg_deltas[level].view(batch_size, 4, -1).permute(0, 2, 1)
  15. centerness_logits[level] = self.pred_ctr(self.stem_box(feature))
  16. # reshape Centerness logits: `(batch_size, H * W, 1)`
  17. centerness_logits[level] = centerness_logits[level].view(batch_size, 1, -1).permute(0, 2, 1)

3. 训练网络

前面讲到了整个网络的构成,此时来细剖整个网络是如何进行训练的。

不同于分类模型,输入的是照片和label,其中label一般都是one-hot 向量。但在该模型中,每张照片都有大量的预测框;每一类的标签不与整张照片相关联而是与预测框相关联。

在FCOS中,使用GT target(Ground Truth Target)。每个GT target关联了三个参数:类别,bounding box,centre-ness。因为最后输出的三个预测值都与FPN层输出的特征图的每个位置有关,所以我们可以将GT boxes直接与FPN层的特征图的位置相关联。而GT box由标签、左上角坐标及右下角坐标构成,是一个5维的向量。

1. 得到特征图中的每个位置

通过训练网络,我们首先要得到特征图中的每一个位置,有了位置我们才有预测框(bounding box),才能预测到目标。

resource

每层的特征图的大小为:(batch_size, channels, H / stride, W / stride)

特征图的位置:Location(i,j)= (stride \times (i+0.5) , stride \times (j+0.5))

  1. i, j: 特征图的高和宽
  2. 0.5:获取特征图中像素的中心
  3. stride是在FPN网络中每层的stride:

由此,我们可以通过循环FPN网络的每一层的特征图得到特征图的大小和stride,来计算位置。

  1. # Replace "pass" statement with your code
  2. # get x (0.5+i) i[0, 28(H)] ([2, 64, 28(H), 28(W)])
  3. x = level_stride * torch.arange(0.5, feat_shape[2] + 0.5, step=1, device=device, dtype=dtype)
  4. # get y (0.5+j) j [0, 28(W)] ([2, 64, 28(H), 28(W)])
  5. y = level_stride * torch.arange(0.5, feat_shape[3] + 0.5, step=1, device=device, dtype=dtype)
  6. # create a grid of these x y
  7. (xGrid, yGrid) = torch.meshgrid(x, y, indexing='xy')
  8. # xGrid shape [14, 14], [7, 7], [4, 4]
  9. # add a new dimension of size 1 at the specified position (-1 in this case) in the tensor.
  10. xGrid = xGrid.unsqueeze(dim=-1)
  11. # print(xGrid.shape)
  12. yGrid = yGrid.unsqueeze(dim=-1)
  13. # concat these two and reshape to (H*W, 2)
  14. location_coords[level_name] = torch.cat((xGrid,yGrid),dim=2).view(feat_shape[3]*feat_shape[2],2)
  15. # print(torch.cat((xGrid, yGrid), dim=2).shape)

得到p3的位置图为

P4的位置图 

P5的位置图 

2. 将GT box和特征图的位置匹配

得到了特征图中的位置后,我们就可以与GT box相匹配了。

详细步骤参考论文中的3.2

  1. def fcos_match_locations_to_gt(
  2. locations_per_fpn_level: TensorDict,
  3. strides_per_fpn_level: Dict[str, int],
  4. gt_boxes: torch.Tensor,
  5. ) -> TensorDict:
  6. """
  7. Match centers of the locations of FPN feature with a set of GT bounding
  8. boxes of the input image. Since our model makes predictions at every FPN
  9. feature map location, we must supervise it with an appropriate GT box.
  10. There are multiple GT boxes in image, so FCOS has a set of heuristics to
  11. assign centers with GT, which we implement here.
  12. NOTE: This function is NOT BATCHED. Call separately for GT box batches.
  13. Args:
  14. locations_per_fpn_level: Centers at different levels of FPN (p3, p4, p5),
  15. that are already projected to absolute co-ordinates in input image
  16. dimension. Dictionary of three keys: (p3, p4, p5) giving tensors of
  17. shape `(H * W, 2)` where H = W is the size of feature map.
  18. strides_per_fpn_level: Dictionary of same keys as above, each with an
  19. integer value giving the stride of corresponding FPN level.
  20. See `common.py` for more details.
  21. gt_boxes: GT boxes of a single image, a batch of `(M, 5)` boxes with
  22. absolute co-ordinates and class ID `(x1, y1, x2, y2, C)`. In this
  23. codebase, this tensor is directly served by the dataloader.
  24. Returns:
  25. Dict[str, torch.Tensor]
  26. Dictionary with same keys as `shape_per_fpn_level` and values as
  27. tensors of shape `(N, 5)` GT boxes, one for each center. They are
  28. one of M input boxes, or a dummy box called "background" that is
  29. `(-1, -1, -1, -1, -1)`. Background indicates that the center does
  30. not belong to any object.
  31. """
  32. matched_gt_boxes = {
  33. level_name: None for level_name in locations_per_fpn_level.keys()
  34. }
  35. # Do this matching individually per FPN level.
  36. for level_name, centers in locations_per_fpn_level.items():
  37. # Get stride for this FPN level.
  38. stride = strides_per_fpn_level[level_name]
  39. # get feature location's centre
  40. x, y = centers.unsqueeze(dim=2).unbind(dim=1)
  41. x0, y0, x1, y1 = gt_boxes[:, :4].unsqueeze(dim=0).unbind(dim=2)
  42. pairwise_dist = torch.stack([x - x0, y - y0, x1 - x, y1 - y], dim=2)
  43. # Pairwise distance between every feature center and GT box edges:
  44. # shape: (num_gt_boxes, num_centers_this_level, 4)
  45. pairwise_dist = pairwise_dist.permute(1, 0, 2)
  46. # The original FCOS anchor matching rule: anchor point must be inside GT.
  47. match_matrix = pairwise_dist.min(dim=2).values > 0
  48. # Multilevel anchor matching in FCOS: each anchor is only responsible
  49. # for certain scale range.
  50. # Decide upper and lower bounds of limiting targets.
  51. pairwise_dist = pairwise_dist.max(dim=2).values
  52. lower_bound = stride * 4 if level_name != "p3" else 0
  53. upper_bound = stride * 8 if level_name != "p5" else float("inf")
  54. match_matrix &= (pairwise_dist > lower_bound) & (
  55. pairwise_dist < upper_bound
  56. )
  57. # Match the GT box with minimum area, if there are multiple GT matches.
  58. gt_areas = (gt_boxes[:, 2] - gt_boxes[:, 0]) * (
  59. gt_boxes[:, 3] - gt_boxes[:, 1]
  60. )
  61. # Get matches and their labels using match quality matrix.
  62. match_matrix = match_matrix.to(torch.float32)
  63. match_matrix *= 1e8 - gt_areas[:, None]
  64. # Find matched ground-truth instance per anchor (un-matched = -1).
  65. match_quality, matched_idxs = match_matrix.max(dim=0)
  66. matched_idxs[match_quality < 1e-5] = -1
  67. # Anchors with label 0 are treated as background.
  68. matched_boxes_this_level = gt_boxes[matched_idxs.clip(min=0)]
  69. matched_boxes_this_level[matched_idxs < 0, :] = -1
  70. matched_gt_boxes[level_name] = matched_boxes_this_level
  71. return matched_gt_boxes

图中黄色的点是所选中的特征图的位置,

红色框是与之匹配的GT box

4. GT target 的候选框回归

候选框回归有四个预测值,分别是(left, top, right, bottom),及特征图的位置到候选框的距离。

FCOS会用每层的stride来归一化这四个距离

在p3中, 

stride:8

位置(黄色点):(xc, yc)shape(N, 2)

GT box:(x1, y1, x2, y2) shape(N, 4) 不含标签或(N, 5)含标签

l = (xc - x1) / stride              t = (yc - y1) / stride
r = (x2 - xc) / stride              b = (y2 - yc) / stride

 用deltas来表示这四个距离的值,deltas的shape则为(N,4),参数4分别表示左(l),上(t),右(r),下(b)。

通过公式计算出LRTB的大小

  1. deltas = torch.empty(gt_boxes.shape[0], 4).to(device=gt_boxes.device, dtype=gt_boxes.dtype)
  2. # GT box have co-ordinates `(x1, y1, x2, y2)`
  3. # location (xc, yc)
  4. # l = (xc - x1) /stride
  5. deltas[:, 0] = (locations[:, 0] - gt_boxes[:, 0]) / stride
  6. # t = (yc - y1) / stride
  7. deltas[:, 1] = (locations[:, 1] - gt_boxes[:, 1]) / stride
  8. # r = (x2 - xc) / stride
  9. deltas[:, 2] = (gt_boxes[:, 2] - locations[:, 0]) / stride
  10. # b = (y2 - yc) / stride
  11. deltas[:, 3] = (gt_boxes[:, 3] - locations[:, 1]) / stride

如果GT box是背景的话,此时默认背景框的值为(-1,-1,-1,-1)或(-1,-1,-1,-1,-1),就要将deltas设置为(-1,-1,-1,-1)。

找到GT box中值为(-1,-1,-1,-1)的框。找到GT box的4四列,然后计算每行的和是否等于-4。若等于-4,那么置deltas为-1。

  1. # If GT boxes are "background", then deltas must be `(-1, -1, -1, -1)`.
  2. # You may assume that all the background boxes will be `(-1, -1, -1, -1)` or `(-1, -1, -1, -1, -1)`.
  3. print('gt_boxes shape', gt_boxes.shape)
  4. deltas[gt_boxes[:, :4].sum(dim=1) == -4] = -1

我们得到距离的值后,将其运用到特征图的位置上,得到目标边界框的坐标

  1. # x_min = c_x - l * s
  2. # y_min = c_y - t * s
  3. # x_max = c_x + r * s
  4. # y_max = c_y + b * s
  5. deltas = deltas.clip(min=0)
  6. output_boxes = torch.empty(deltas.size()).to(device=deltas.device, dtype=deltas.dtype)
  7. output_boxes[:, 3] = locations[:, 1] + stride * deltas[:, 3]
  8. output_boxes[:, 2] = locations[:, 0] + stride * deltas[:, 2]
  9. output_boxes[:, 1] = locations[:, 1] - stride * deltas[:, 1]
  10. output_boxes[:, 0] = locations[:, 0] - stride * deltas[:, 0]

5.Centre-ness 

centre ness 是用来调整在模型中每个位置的分类的值。用于判断特征图上的某一像素距离检测框中心的距离。目的是为了提高模型的特征图的位置的准确度,第一版的论文也有指出为何centre ness 对模型特征图位置准确度提高有用。

centre-ness的计算公式如下:

 centerness = \sqrt{\frac{\min(left, right) \cdot \min(top, bottom)}{\max(left, right) \cdot \max(top, bottom)}}

  1. center_ness = torch.empty(deltas.shape[0]).to(device=deltas.device, dtype=deltas.dtype)
  2. center_ness = torch.sqrt(torch.min(deltas[:, 0], deltas[:, 2]) * torch.min(deltas[:, 1], deltas[:, 3]) /
  3. torch.max(deltas[:, 0], deltas[:, 2]) * torch.max(deltas[:, 1], deltas[:, 3]))

6. loss

因为FCOS有三个预测值,分别对应三个不同的分支:

目标分类

使用Fcal Loss,Fcal Loss,交叉熵损失函数的进阶版本,用来解决分类不均衡的问题。

                FL(p_t) = -(1-p_t)^\gamma log(p_t)

  • p_t​ 是特征图中每个位置在目标中的预测概率,这个预测概率将与GT box的标签相比较。
  • \gamma是聚焦参数的可调参数(通常设置为 2 等值)
  • 关于这个分类不均衡的问题是因为之前我们在计算deltas的时候,很多位置都被默认为背景。若不干预这个问题,模型就会跑去学习这些背景。

 详细流程:

1. 通过softmax来计算GT box类是否是背景的概率

p = torch.sigmoid(inputs)

2. 通过交叉熵损失来计算GT box类的概率,所以上述公式中的-log(p_t)实际是计算的交叉熵损失

ce_loss = F.binary_cross_entropy_with_logits(logits, targets, reduction="none")

3. 因为GT box是否是背景的分布不均衡,所以要增加一个权重alpha来解决这个问题。而当GT box =1 时,α ∈ [0, 1] ,GT box = -1时,1−α

  1. # 1-p 背景的概率
  2. p_t = p * targets + (1 - p) * (1 - targets)

4.计算loss

loss = ce_loss * ((1 - p_t) ** gamma)

box regression:

  • 本处使用的是L1 loss 比FCOS中的IoU loss要计算的快些,但IoU的结果要更好一些。使用该loss去最小化预测值与GT LTRB deltas之间的差距。
  1. # First calculate box reg loss, comparing predicted boxes and GT boxes.
  2. dummy_gt_deltas = fcos_get_deltas_from_locations(
  3. dummy_locations, dummy_gt_boxes, stride=32
  4. )
  5. # Multiply with 0.25 to average across four LTRB components.
  6. loss_box = 0.25 * F.l1_loss(
  7. dummy_pred_boxreg_deltas, dummy_gt_deltas, reduction="none"
  8. )
  9. # No loss for background:
  10. loss_box[dummy_gt_deltas < 0] *= 0.0
  11. print("Box regression loss (L1):", loss_box)

Centreness regression:

因为centreness的预测值和GT target的值都在[0,1]之前,所以使用二元交叉熵函数最好不过了。

  1. # Now calculate centerness loss.
  2. centerness_loss = F.binary_cross_entropy_with_logits(
  3. dummy_pred_ctr_logits, dummy_gt_centerness, reduction="none"
  4. )
  5. # No loss for background:
  6. centerness_loss[dummy_gt_centerness < 0] *= 0.0
  7. print("Centerness loss (BCE):", centerness_loss)

总的loss:

每一个位置都有三个预测值,若该值被认定为是背景,则box regression 和centreness regression的loss都为0,因为GT target没有定义。

总的loss,通过将三者的loss相加,除以非背景的位置的数量,数量取决于图像中目标的数量。

7.目标检测模型

将backbone(含FPN)和prediction net组合在一起

  1. self.backbone = DetectorBackboneWithFPN(fpn_channels)
  2. self.pred_net = FCOSPredictionNetwork(num_classes, fpn_channels, stem_channels)

1. 将照片喂给网络,得到FPN的参数和三个预测值

  1. fpn_info = self.backbone(images)
  2. pred_cls_logits, pred_boxreg_deltas, pred_ctr_logits = self.pred_net(fpn_info)

2. 获得FPN层的特征图中的位置们

  1. fpn_shape = {"p3": fpn_info["p3"].shape, "p4": fpn_info["p4"].shape, "p5": fpn_info["p5"].shape}
  2. locations_per_fpn_level = get_fpn_location_coords(fpn_shape, self.backbone.fpn_strides, device=images.device)

3. 给位置们添加上GT boxes

  • 获得匹配上了的GT boxes

  1. matched_gt_boxes = []
  2. # Replace "pass" statement with your code
  3. for i in range(images.shape[0]):
  4. matched_gt_boxes.append(
  5. fcos_match_locations_to_gt(locations_per_fpn_level, self.backbone.fpn_strides, gt_boxes[i, :, :]))
  • 计算每个匹配上了的GT box的deltas(LTRB)

  1. matched_gt_deltas = []
  2. # Replace "pass" statement with your code
  3. for i in range(images.shape[0]):
  4. matched_delta = {}
  5. for level_name, feat_location in locations_per_fpn_level.items():
  6. matched_delta[level_name] = fcos_get_deltas_from_locations(feat_location,
  7. matched_gt_boxes[i][level_name],
  8. self.backbone.fpn_strides[level_name])
  9. matched_gt_deltas.append(matched_delta)
  •  通过deltas来计算预测框们

  1. # Calculate predicted boxes from the predicted deltas. Similar structure
  2. # as `matched_gt_boxes` above. Fill this list:
  3. pred_boxes = []
  4. # Replace "pass" statement with your code
  5. for i in range(images.shape[0]):
  6. pred_box = {}
  7. for level_name in locations_per_fpn_level.keys():
  8. pred_box[level_name] = fcos_apply_deltas_to_locations(pred_boxreg_deltas[level_name][i],
  9. locations_per_fpn_level[level_name],
  10. self.backbone.fpn_strides[level_name])
  11. pred_boxes.append(pred_box)

4. 计算三个预测值的loss

4.1 计算分类的loss

  1. # calculate the gt class
  2. gt_classes = matched_gt_boxes[:, :, 4].clone()
  3. # set background's gt class to -1
  4. bg_mask = gt_classes == -1
  5. # set gt classes's background class to 0
  6. gt_classes[bg_mask] = 0
  7. # set one hot coding to gt classes
  8. gt_classes_one_hot = torch.nn.functional.one_hot(gt_classes.long(), self.num_classes)
  9. gt_classes_one_hot = gt_classes_one_hot.to(gt_boxes.dtype)
  10. # set the background one hot to 0
  11. gt_classes_one_hot[bg_mask] = 0
  12. # calculate the classification's loss
  13. loss_cls = sigmoid_focal_loss(inputs=pred_cls_logits, targets=gt_classes_one_hot)

4.2 计算预测框回归的loss(IoU loss)

  1. # reshape the prediction boxes to (N , 4)
  2. pred_boxreg_deltas = pred_boxreg_deltas.reshape(-1, 4)
  3. # reshape the matched gt box to ( N, 4)
  4. matched_gt_deltas = matched_gt_deltas.reshape(-1, 4)
  5. # Find the background images
  6. matched_boxes = matched_gt_boxes[:, :, 4].clone().reshape(-1)
  7. background_mask = matched_boxes == -1
  8. # we can compare the loss of L1 loss and IoU loss
  9. # Calculate the box loss by iou loss
  10. loss_box = loss_box = torchvision.ops.generalized_box_iou_loss(pred_boxes.reshape(-1, 4),
  11. matched_gt_boxes[:, :, :4].reshape(-1, 4),
  12. reduction="none")
  13. # Do not count the loss of background images
  14. loss_box[background_mask] = 0

4.3 计算centreness的loss

  1. # reshape to vector
  2. pred_ctr_logits = pred_ctr_logits.view(-1)
  3. # get the centreness
  4. gt_centerness = fcos_make_centerness_targets(matched_gt_deltas)
  5. # calculate the centreness loss by BEC
  6. loss_ctr = F.binary_cross_entropy_with_logits(pred_ctr_logits, gt_centerness, reduction="none")
  7. loss_ctr[gt_centerness <= 0] = 0

8.非极大值抑制NMS

在目标检测中,经常输出很多重复的目标。

1. 挑选出得分最高的框

2. 通过IoU > 阀值(比如:0.7),来排除最低得分的框

3. 若每个框都涉及,那继续按照步骤1处理

在该照片中:IoU(, ) = 0.78 IoU(, ) = 0.05 IoU(, ) = 0.07 ,得到蓝框

IoU(, ) = 0.74,得到紫框。

Reference:

1. 一阶目标检测(one-stage object detection)整理归纳

2.  FCOS: Fully Convolutional One-Stage Object Detection

3. FCOS: A Simple and Strong Anchor-free Object Detector

4. EECS 498-007 / 598-005Deep Learning for Computer Vision

5. Feature Pyramid Networks for Object Detection

6.  1.1.2 FPN结构详解

7. FCOS Walkthrough: The Fully Convolutional Approach to Object Detection 

8. FCOS网络解析 

9. Focal Loss for Dense Object Detection 

10. Overlapping Boxes: Non-Max Suppression (NMS)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/寸_铁/article/detail/778202
推荐阅读
相关标签
  

闽ICP备14008679号