赞
踩
代码下载:https://github.com/pakaqiu/yolov3_simple
视频链接:https://www.bilibili.com/video/BV1MK4y1X74Q?p=1
yolov1到v3损失函数都是逐步修改而来,特别是v4,v5损失函数的改动对于目标检测的性能和效果具有较大的提升;当然v3相比v1,v2还是有不少提升的,这种性能提升不仅仅是损失函数所带来的,网络结构的修改和优化也具有比较可观的效果。本文主要讲解v3损失函数的设计,这里首先回顾下v1,v2:
v1损失函数:
v2的损失函数:
v2只是在v1的基础上改动了box宽高的损失计算方式,即去除了w和h的根号:
v3相对v2最大的改动就是分类的损失以及box置信度的损失改为了二分交叉熵:
上式中S为网络输出层网格的数目,B为anchors的数目;网络的输出为SS大小的特征图,即网格SS;每个网格有B个anchor,一共会得到SSB个bounding box,那这么多个bounding box,损失函数是如何进行回归的呢?下面对v3的损失函数中的(a)-(e)一一进行解析:
I
i
j
o
b
j
I_{ij}^{obj}
Iijobj表示第i个网格中第j个anchor,如果第j个anchor负责这个object,那么
I
i
j
o
b
j
=
1
I_{ij}^{obj} = 1
Iijobj=1,否则的话就等于0。一个网络中有B个anchor,那么负责就是B个anchor中与ground truth box的IOU最大的那个anchor。
I
i
j
n
o
o
b
j
I_{ij}^{noobj}
Iijnoobj表示第i个网格中的第j个anchor不负责这个object。
上面Loss中的(a)表示目标物体的中心坐标的误差,在训练过程中回归的是是中心坐标的偏移量,这里需要结合代码去仔细体会;每个网格有B个anchors,在训练过程中只取iou最大的anchor才能负责当前网格的回归。
Loss中的(b)表示目标物体的宽高坐标误差,训练过程中并不是直接回归目标物体的宽高坐标,而是利用网格本身满足条件的最佳anchor去回归宽高坐标误差的。
Loss中 ( c) ,(d)表示目标物体置信度的误差,这里的损失是采用交叉熵来进行计算的,此外,不管网格是否有负责某个目标物体,都会计算置信度误差,但是输入的图像中大部分的空间不包含目标物体,只有少部分空间包含了目标物体,因此需要添加权重对不包含目标物体的置信度损失进行约束。其中
C
~
j
i
\tilde{C}_{j}^{i}
C~ji表示真实值,
C
~
j
i
\tilde{C}_{j}^{i}
C~ji的值由网格的bounding box是否负责预测某个对象决定;负责,则
C
~
j
i
\tilde{C}_{j}^{i}
C~ji = 1,反之,则相反;
C
j
i
{C}_{j}^{i}
Cji表示拟合值。
Loss中(e)表示目标物体的分类误差,这里的损失采用交叉熵进行计算,只有当第i个网格的第j个anchor负责某个真实目标物体,这个anchor产生的bounding box才会去计算分类损失精度。
具体代码实现:
def forward(self, x, targets=None, img_dim=None): # Tensors for cuda support FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor ByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensor self.img_dim = img_dim num_samples = x.size(0) grid_size = x.size(2) prediction = ( x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size) .permute(0, 1, 3, 4, 2) .contiguous() ) # Get outputs x = torch.sigmoid(prediction[..., 0]) # Center x y = torch.sigmoid(prediction[..., 1]) # Center y w = prediction[..., 2] # Width h = prediction[..., 3] # Height pred_conf = torch.sigmoid(prediction[..., 4]) # Conf pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred. # If grid size does not match current we compute new offsets if grid_size != self.grid_size: self.compute_grid_offsets(grid_size, cuda=x.is_cuda) # Add offset and scale with anchors pred_boxes = FloatTensor(prediction[..., :4].shape) pred_boxes[..., 0] = x.data + self.grid_x pred_boxes[..., 1] = y.data + self.grid_y pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h output = torch.cat( ( pred_boxes.view(num_samples, -1, 4) * self.stride, pred_conf.view(num_samples, -1, 1), pred_cls.view(num_samples, -1, self.num_classes), ), -1, ) if targets is None: return output, 0 else: iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets( pred_boxes=pred_boxes, pred_cls=pred_cls, target=targets, anchors=self.scaled_anchors, ignore_thres=self.ignore_thres, ) # Loss : Mask outputs to ignore non-existing objects (except with conf. loss) loss_x = self.mse_loss(x[obj_mask], tx[obj_mask]) loss_y = self.mse_loss(y[obj_mask], ty[obj_mask]) loss_w = self.mse_loss(w[obj_mask], tw[obj_mask]) loss_h = self.mse_loss(h[obj_mask], th[obj_mask]) loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask]) loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask]) loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask]) total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls # Metrics cls_acc = 100 * class_mask[obj_mask].mean() conf_obj = pred_conf[obj_mask].mean() conf_noobj = pred_conf[noobj_mask].mean() conf50 = (pred_conf > 0.5).float() iou50 = (iou_scores > 0.5).float() iou75 = (iou_scores > 0.75).float() detected_mask = conf50 * class_mask * tconf precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16) recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16) recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16) self.metrics = { "loss": to_cpu(total_loss).item(), "x": to_cpu(loss_x).item(), "y": to_cpu(loss_y).item(), "w": to_cpu(loss_w).item(), "h": to_cpu(loss_h).item(), "conf": to_cpu(loss_conf).item(), "cls": to_cpu(loss_cls).item(), "cls_acc": to_cpu(cls_acc).item(), "recall50": to_cpu(recall50).item(), "recall75": to_cpu(recall75).item(), "precision": to_cpu(precision).item(), "conf_obj": to_cpu(conf_obj).item(), "conf_noobj": to_cpu(conf_noobj).item(), "grid_size": grid_size, } return output, total_loss
def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres): ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor nB = pred_boxes.size(0) nA = pred_boxes.size(1) nC = pred_cls.size(-1) nG = pred_boxes.size(2) # Output tensors obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0) noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1) class_mask = FloatTensor(nB, nA, nG, nG).fill_(0) iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0) tx = FloatTensor(nB, nA, nG, nG).fill_(0) ty = FloatTensor(nB, nA, nG, nG).fill_(0) tw = FloatTensor(nB, nA, nG, nG).fill_(0) th = FloatTensor(nB, nA, nG, nG).fill_(0) tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0) # Convert to position relative to box target_boxes = target[:, 2:6] * nG gxy = target_boxes[:, :2] gwh = target_boxes[:, 2:] # Get anchors with best iou ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors]) best_ious, best_n = ious.max(0) #求最大iou # Separate target values b, target_labels = target[:, :2].long().t() gx, gy = gxy.t() gw, gh = gwh.t() gi, gj = gxy.long().t() # Set masks obj_mask[b, best_n, gj, gi] = 1 #最佳anchor对应的位置obj_mask设置为1 noobj_mask[b, best_n, gj, gi] = 0 #最佳anchor对应位置noobj_mask设置为0 # Set noobj mask to zero where iou exceeds ignore threshold for i, anchor_ious in enumerate(ious.t()): noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0 #anchor与gt的iou > ignore_thres的noobj_mask设置为0,即满足该条件的anchor不管 # Coordinates tx[b, best_n, gj, gi] = gx - gx.floor() #中心坐标转换为偏移量 ty[b, best_n, gj, gi] = gy - gy.floor() # Width and height tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16) #目标物体宽高转换 th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16) # One-hot encoding of label tcls[b, best_n, gj, gi, target_labels] = 1 # Compute label correctness and iou at best anchor class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float() iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False) tconf = obj_mask.float() return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf
水平有限,不当之处请指教,谢谢!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。