当前位置:   article > 正文

OSTrack 中的边界框回归策略

ostrack

目录

一、裁剪和标签的设置

二、模型的预测输出的边界框回归


一、裁剪和标签的设置

 1、添加偏移量,得到偏移后的边界框

jittered_anno = [self._get_jittered_box(a, s) for a in data[s + '_anno']]

2、以偏移后的边界框为中心,进行裁剪

首先以偏移边界框面积的4^{2}倍裁剪搜索区域,

crop_sz = torch.ceil(torch.sqrt(w * h) * self.search_area_factor[s])

sz=\sqrt{w*h}*4

然后进行裁剪填充

  1. def sample_target(im, target_bb, search_area_factor, output_sz=None, mask=None):
  2. """ Extracts a square crop centered at target_bb box, of area search_area_factor^2 times target_bb area
  3. args:
  4. im - cv image
  5. target_bb - target box [x, y, w, h]
  6. search_area_factor - Ratio of crop size to target size
  7. output_sz - (float) Size to which the extracted crop is resized (always square). If None, no resizing is done.
  8. returns:
  9. cv image - extracted crop
  10. float - the factor by which the crop has been resized to make the crop size equal output_size
  11. """
  12. if not isinstance(target_bb, list):
  13. x, y, w, h = target_bb.tolist()
  14. else:
  15. x, y, w, h = target_bb
  16. # Crop image
  17. crop_sz = math.ceil(math.sqrt(w * h) * search_area_factor) # 466
  18. if crop_sz < 1:
  19. raise Exception('Too small bounding box.')
  20. x1 = round(x + 0.5 * w - crop_sz * 0.5)
  21. x2 = x1 + crop_sz
  22. y1 = round(y + 0.5 * h - crop_sz * 0.5)
  23. y2 = y1 + crop_sz
  24. x1_pad = max(0, -x1)
  25. x2_pad = max(x2 - im.shape[1] + 1, 0)
  26. y1_pad = max(0, -y1)
  27. y2_pad = max(y2 - im.shape[0] + 1, 0)
  28. # Crop target
  29. im_crop = im[y1 + y1_pad:y2 - y2_pad, x1 + x1_pad:x2 - x2_pad, :] # ndarray:(466,466,3)
  30. if mask is not None:
  31. mask_crop = mask[y1 + y1_pad:y2 - y2_pad, x1 + x1_pad:x2 - x2_pad] # Tensor:(466,466)
  32. # Pad
  33. im_crop_padded = cv.copyMakeBorder(im_crop, y1_pad, y2_pad, x1_pad, x2_pad, cv.BORDER_CONSTANT) # ndarray:(466,466,3) 如果裁剪区域超出边界则填充
  34. # deal with attention mask
  35. H, W, _ = im_crop_padded.shape # 446, 446, 3
  36. att_mask = np.ones((H,W)) # ndarray:(466,466)
  37. end_x, end_y = -x2_pad, -y2_pad # 0, 0
  38. if y2_pad == 0:
  39. end_y = None
  40. if x2_pad == 0:
  41. end_x = None
  42. att_mask[y1_pad:end_y, x1_pad:end_x] = 0
  43. if mask is not None: # True
  44. mask_crop_padded = F.pad(mask_crop, pad=(x1_pad, x2_pad, y1_pad, y2_pad), mode='constant', value=0)

3、进行resize

  1. if output_sz is not None: # True
  2. resize_factor = output_sz / crop_sz
  3. im_crop_padded = cv.resize(im_crop_padded, (output_sz, output_sz)) # ndarray:(128,128,3)
  4. att_mask = cv.resize(att_mask, (output_sz, output_sz)).astype(np.bool_) # ndarray:(128,128,3) bool型
  5. if mask is None:
  6. return im_crop_padded, resize_factor, att_mask
  7. mask_crop_padded = \
  8. F.interpolate(mask_crop_padded[None, None], (output_sz, output_sz), mode='bilinear', align_corners=False)[0, 0] # Tensor:(128,128)
  9. return im_crop_padded, resize_factor, att_mask, mask_crop_padded

 resize成输入大小,这里记录了 output_sz/crop_sz的大小,后面要用。这一步已经确定了裁剪的输入图像,但是标签还没对齐。

4、对齐标签

  1. def transform_image_to_crop(box_in: torch.Tensor, box_extract: torch.Tensor, resize_factor: float,
  2. crop_sz: torch.Tensor, normalize=False) -> torch.Tensor:
  3. """ Transform the box co-ordinates from the original image co-ordinates to the co-ordinates of the cropped image
  4. args:
  5. box_in - the box for which the co-ordinates are to be transformed
  6. box_extract - the box about which the image crop has been extracted.
  7. resize_factor - the ratio between the original image scale and the scale of the image crop
  8. crop_sz - size of the cropped image
  9. returns:
  10. torch.Tensor - transformed co-ordinates of box_in
  11. """
  12. box_extract_center = box_extract[0:2] + 0.5 * box_extract[2:4]
  13. box_in_center = box_in[0:2] + 0.5 * box_in[2:4]
  14. box_out_center = (crop_sz - 1) / 2 + (box_in_center - box_extract_center) * resize_factor
  15. box_out_wh = box_in[2:4] * resize_factor
  16. box_out = torch.cat((box_out_center - 0.5 * box_out_wh, box_out_wh))
  17. if normalize:
  18. return box_out / crop_sz[0]
  19. else:
  20. return box_out

首先计算偏移边界框的中心坐标和ground truth 边界框的中心坐标

x_1,y_1=x+0.5*w,y+0.5*w

x_0,y_0=x+0.5*w,y+0.5*w

其中x和y为边界框的左上角顶点坐标。

接下来对齐标签

gt_{center}=(outputsz-1)/2+(x_0-x_1,y_0-y_1)*resizefactor

outputsz为需要输入的大小,

之后将中心坐标形式转成了左顶点坐标的形式 (x,y,w,h),然后进行了归一化

return box_out / crop_sz[0]

都除以了 输入的尺寸,比如384,256

5、生成head需要预测的标签

经过上述操作还没完,只是对齐了gt bbox 和裁剪输入,还需要生成模型预测的标签。

1)分类标签,

由 gt bbox的中心坐标生成高斯图

  1. def generate_heatmap(bboxes, patch_size=320, stride=16): # Tensor:(1,4,4), 256, 16
  2. """
  3. Generate ground truth heatmap same as CenterNet
  4. Args:
  5. bboxes (torch.Tensor): shape of [num_search, bs, 4]
  6. Returns:
  7. gaussian_maps: list of generated heatmap
  8. """
  9. gaussian_maps = []
  10. heatmap_size = patch_size // stride # 16
  11. for single_patch_bboxes in bboxes: # Tensor:(4,4)
  12. bs = single_patch_bboxes.shape[0] # 4
  13. gt_scoremap = torch.zeros(bs, heatmap_size, heatmap_size) # Tensor:(4,16,16)
  14. classes = torch.arange(bs).to(torch.long) # tensor:([0,1,2,3])
  15. bbox = single_patch_bboxes * heatmap_size # Tensor:(4,4)
  16. wh = bbox[:, 2:] # Tensor:(4,2)
  17. centers_int = (bbox[:, :2] + wh / 2).round() # Tensor:(4,2) 中心点
  18. CenterNetHeatMap.generate_score_map(gt_scoremap, classes, wh, centers_int, 0.7)
  19. gaussian_maps.append(gt_scoremap.to(bbox.device))
  20. return gaussian_maps

2) 回归标签

就是 gt bbox本身,但是,需要注意的是,这里的gt bbox已经归一化。

而且网络的输出是 得分图,size 和 offset,所以回归标签不是直接的,而是间接的。

二、模型的预测输出的边界框回归

经过输出头的输出包含三个

score_map_ctr, size_map, offset_map = self.get_score_map(x)  # Tensor:(4,1,16,16) , Tensor:(4,2,16,16), Tensor:(4,2,16,16)

回归边界框

  1. def cal_bbox(self, score_map_ctr, size_map, offset_map, return_score=False):
  2. max_score, idx = torch.max(score_map_ctr.flatten(1), dim=1, keepdim=True) # shape都是 Tensor:(4,1) 按 batch 拿出最大的得分和所对应的索引
  3. idx_y = idx // self.feat_sz # Tensor:(4,1)
  4. idx_x = idx % self.feat_sz # Tensor:(4,1)
  5. idx = idx.unsqueeze(1).expand(idx.shape[0], 2, 1) # Tensor:(4,2,1)
  6. size = size_map.flatten(2).gather(dim=2, index=idx) # Tensor:(4,2,1)
  7. offset = offset_map.flatten(2).gather(dim=2, index=idx).squeeze(-1) # Tensor:(4,2)
  8. # bbox = torch.cat([idx_x - size[:, 0] / 2, idx_y - size[:, 1] / 2,
  9. # idx_x + size[:, 0] / 2, idx_y + size[:, 1] / 2], dim=1) / self.feat_sz
  10. # cx, cy, w, h
  11. bbox = torch.cat([(idx_x.to(torch.float) + offset[:, :1]) / self.feat_sz,
  12. (idx_y.to(torch.float) + offset[:, 1:]) / self.feat_sz,
  13. size.squeeze(-1)], dim=1) # Tensor:(4,4)
  14. if return_score:
  15. return bbox, max_score
  16. return bbox

这里的是中心坐标的形式。训练阶段直接用他们呢计算损失函数。

推理阶段,

  1. pred_box = (pred_boxes.mean(
  2. dim=0) * self.params.search_size / resize_factor).tolist() # (cx, cy, w, h) [0,1] 乘上search size去规一化

去规一化,将预测的bbox 转换成 裁剪图片的尺度,并且注意这里实现的是将裁剪图片的尺度与原图片保持在同一尺度上。

  1. def map_box_back(self, pred_box: list, resize_factor: float):
  2. cx_prev, cy_prev = self.state[0] + 0.5 * self.state[2], self.state[1] + 0.5 * self.state[3]
  3. cx, cy, w, h = pred_box
  4. half_side = 0.5 * self.params.search_size / resize_factor
  5. cx_real = cx + (cx_prev - half_side)
  6. cy_real = cy + (cy_prev - half_side)
  7. return [cx_real - 0.5 * w, cy_real - 0.5 * h, w, h]

这里的self.state为前一帧的预测 bbox。此时,预测的bbox为在裁剪图片中的坐标,所以想要将他返回原img上的坐标需要 计算裁剪图片的坐标系与原img的坐标系的相对坐标变换,因此,用前一阵预测的bbox的中心坐标减去裁剪图片的中心坐标就得到了相对坐标变换,直接加上相对坐标即可得到预测的原img的坐标。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/417926
推荐阅读
相关标签
  

闽ICP备14008679号