当前位置:   article > 正文

YOLO V8-Detection 【批量图片推理】 推理详解及部署实现_yolov8批量测试图片

yolov8批量测试图片

前言

在实际处理过程中,我们使用YOLO V8进行推理时,通常会针对一张图片进行推理。如果需要对多张图片进行推理,则可以通过一个循环来实现对图片逐张进行推理。

单张图片推理时需要注意图片的尺寸必须是32的倍数,否则可能导致推理失败。在下面的示例中,我们展示了如何使用PyTorch和Ultralytics库进行单张图片的推理:

  1. import torch
  2. from ultralytics import YOLO
  3. # Load a pretrained YOLOv8n model
  4. model = YOLO('yolov8n.pt')
  5. # Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32
  6. source = torch.rand(1, 3, 640, 640, dtype=torch.float32)
  7. # Run inference on the source
  8. results = model(source) # list of Results objects

批量图片推理时也需要注意图片的尺寸必须是32的倍数。在下面的示例中,我们展示了如何使用PyTorch和Ultralytics库进行多张图片的批量推理:

  1. import torch
  2. from ultralytics import YOLO
  3. # Load a pretrained YOLOv8n model
  4. model = YOLO('yolov8n.pt')
  5. # Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32
  6. source = torch.rand(4, 3, 640, 640, dtype=torch.float32)
  7. # Run inference on the source
  8. results = model(source) # list of Results objects

需要注意的是,在批量推理时,虽然一次推理了多张图片,但实际处理方式仍然是通过循环进行的。在下面的文章中,我们将介绍如何使用更高效的方式进行批量推理,以获得更快的推理速度和更好的性能。

下面我们介绍如何将【单张图片推理】检测代码给修改成 【批量图片推理】代码,进行批量推理。

一、批量推理的前处理

原始代码

  1. @staticmethod
  2. def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), scaleup=True, stride=32):
  3. # Resize and pad image while meeting stride-multiple constraints
  4. shape = im.shape[:2] # current shape [height, width]
  5. if isinstance(new_shape, int):
  6. new_shape = (new_shape, new_shape)
  7. # Scale ratio (new / old)
  8. r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
  9. if not scaleup: # only scale down, do not scale up (for better val mAP)
  10. r = min(r, 1.0)
  11. # Compute padding
  12. ratio = r, r # width, height ratios
  13. new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
  14. dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
  15. # minimum rectangle
  16. dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
  17. dw /= 2 # divide padding into 2 sides
  18. dh /= 2
  19. if shape[::-1] != new_unpad: # resize
  20. im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
  21. top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
  22. left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
  23. im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
  24. return im, ratio, (dw, dh)
  25. def precess_image(self, img_src, img_size, half, device):
  26. # Padded resize
  27. img = self.letterbox(img_src, img_size)[0]
  28. # Convert
  29. img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
  30. img = np.ascontiguousarray(img)
  31. img = torch.from_numpy(img).to(device)
  32. img = img.half() if half else img.float() # uint8 to fp16/32
  33. img = img / 255 # 0 - 255 to 0.0 - 1.0
  34. if len(img.shape) == 3:
  35. img = img[None] # expand for batch dim
  36. return img

处理方式

我们要先知道在原始处理方式中是如何操作的:

它包含以下步骤:

  • self.pre_transform:即 letterbox 添加灰条

  • img.transpose((2, 0, 1))[::-1]:HWC to CHW, BGR to RGB

  • torch.from_numpy:to Tensor

  • img.float() :uint8 to fp32

  • im /= 255:除以 255,归一化

  • img[None]:增加维度

在上述处理过程中我们最主要进行修改的就是 self.pre_transform 里面的操作,其余部分都是可以直接进行批量操作的。

letterbox 中最主要的操作就是下面两个函数,使用 opencv 进行实现的。我们要进行批量操作,那么 opencv 库是不能实现的,进行批量操作一般会用 广播机制 或者 tensor操作 来实现。

  1. im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
  2. im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)

由于最终输入到模型里面的是一个tensor,所以在这里我们使用 tensor的操作方式进行实现。

尺寸修改

  1. 原始方法:
  2. im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
  3. 现在方法:
  4. resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False)

两者的实现效果:

原始方法:(1176, 1956, 3) --》(385, 640, 3)

现在方法:(1176, 1956, 3) --》(385, 640, 3)

添加边框

  1. 原始方式:
  2. im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
  3. 现在方法:
  4. padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value)

两者的实现效果:

原始方法:(385, 640, 3) --》(416, 640, 3)

现在方法:(385, 640, 3) --》(416, 640, 3)

修改后的代码

  1. def tensor_process(self, image_cv):
  2. img_shape = image_cv.shape[1:]
  3. new_shape = [640, 640]
  4. r = min(new_shape[0] / img_shape[0], new_shape[1] / img_shape[1])
  5. # Compute padding
  6. new_unpad = int(round(img_shape[0] * r)), int(round(img_shape[1] * r))
  7. dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
  8. dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding
  9. dw /= 2 # divide padding into 2 sides
  10. dh /= 2
  11. top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
  12. left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
  13. padding_value = 114
  14. image_tensor = torch.from_numpy(image_cv).permute(0, 3, 1, 2).float()
  15. image_tensor = image_tensor.to(self.device)
  16. resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False)
  17. padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value)
  18. infer_tensor = padded_tensor / 255.0
  19. return infer_tensor

二、批量推理的后处理

原始代码

  1. def non_max_suppression(
  2. prediction,
  3. conf_thres=0.25,
  4. iou_thres=0.45,
  5. classes=None,
  6. agnostic=False,
  7. multi_label=False,
  8. labels=(),
  9. max_det=300,
  10. nc=0, # number of classes (optional)
  11. max_time_img=0.05,
  12. max_nms=30000,
  13. max_wh=7680,
  14. rotated=False,
  15. ):
  16. """
  17. Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.
  18. Args:
  19. prediction (torch.Tensor): A tensor of shape (batch_size, num_classes + 4 + num_masks, num_boxes)
  20. containing the predicted boxes, classes, and masks. The tensor should be in the format
  21. output by a model, such as YOLO.
  22. conf_thres (float): The confidence threshold below which boxes will be filtered out.
  23. Valid values are between 0.0 and 1.0.
  24. iou_thres (float): The IoU threshold below which boxes will be filtered out during NMS.
  25. Valid values are between 0.0 and 1.0.
  26. classes (List[int]): A list of class indices to consider. If None, all classes will be considered.
  27. agnostic (bool): If True, the model is agnostic to the number of classes, and all
  28. classes will be considered as one.
  29. multi_label (bool): If True, each box may have multiple labels.
  30. labels (List[List[Union[int, float, torch.Tensor]]]): A list of lists, where each inner
  31. list contains the apriori labels for a given image. The list should be in the format
  32. output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2).
  33. max_det (int): The maximum number of boxes to keep after NMS.
  34. nc (int, optional): The number of classes output by the model. Any indices after this will be considered masks.
  35. max_time_img (float): The maximum time (seconds) for processing one image.
  36. max_nms (int): The maximum number of boxes into torchvision.ops.nms().
  37. max_wh (int): The maximum box width and height in pixels
  38. Returns:
  39. (List[torch.Tensor]): A list of length batch_size, where each element is a tensor of
  40. shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns
  41. (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).
  42. """
  43. # Checks
  44. assert 0 <= conf_thres <= 1, f"Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0"
  45. assert 0 <= iou_thres <= 1, f"Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0"
  46. if isinstance(prediction, (list, tuple)): # YOLOv8 model in validation model, output = (inference_out, loss_out)
  47. prediction = prediction[0] # select only inference output
  48. bs = prediction.shape[0] # batch size
  49. nc = nc or (prediction.shape[1] - 4) # number of classes
  50. nm = prediction.shape[1] - nc - 4
  51. mi = 4 + nc # mask start index
  52. xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates
  53. # Settings
  54. # min_wh = 2 # (pixels) minimum box width and height
  55. time_limit = 0.5 + max_time_img * bs # seconds to quit after
  56. multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
  57. prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84)
  58. if not rotated:
  59. prediction[..., :4] = xywh2xyxy(prediction[..., :4]) # xywh to xyxy
  60. t = time.time()
  61. output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
  62. for xi, x in enumerate(prediction): # image index, image inference
  63. # Apply constraints
  64. # x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0 # width-height
  65. x = x[xc[xi]] # confidence
  66. # Cat apriori labels if autolabelling
  67. if labels and len(labels[xi]) and not rotated:
  68. lb = labels[xi]
  69. v = torch.zeros((len(lb), nc + nm + 4), device=x.device)
  70. v[:, :4] = xywh2xyxy(lb[:, 1:5]) # box
  71. v[range(len(lb)), lb[:, 0].long() + 4] = 1.0 # cls
  72. x = torch.cat((x, v), 0)
  73. # If none remain process next image
  74. if not x.shape[0]:
  75. continue
  76. # Detections matrix nx6 (xyxy, conf, cls)
  77. box, cls, mask = x.split((4, nc, nm), 1)
  78. if multi_label:
  79. i, j = torch.where(cls > conf_thres)
  80. x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
  81. else: # best class only
  82. conf, j = cls.max(1, keepdim=True)
  83. x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]
  84. # Filter by class
  85. if classes is not None:
  86. x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]
  87. # Check shape
  88. n = x.shape[0] # number of boxes
  89. if not n: # no boxes
  90. continue
  91. if n > max_nms: # excess boxes
  92. x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence and remove excess boxes
  93. # Batched NMS
  94. c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
  95. scores = x[:, 4] # scores
  96. if rotated:
  97. boxes = torch.cat((x[:, :2] + c, x[:, 2:4], x[:, -1:]), dim=-1) # xywhr
  98. i = nms_rotated(boxes, scores, iou_thres)
  99. else:
  100. boxes = x[:, :4] + c # boxes (offset by class)
  101. i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
  102. i = i[:max_det] # limit detections
  103. # # Experimental
  104. # merge = False # use merge-NMS
  105. # if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
  106. # # Update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
  107. # from .metrics import box_iou
  108. # iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
  109. # weights = iou * scores[None] # box weights
  110. # x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
  111. # redundant = True # require redundant detections
  112. # if redundant:
  113. # i = i[iou.sum(1) > 1] # require redundancy
  114. output[xi] = x[i]
  115. if (time.time() - t) > time_limit:
  116. LOGGER.warning(f"WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded")
  117. break # time limit exceeded
  118. return output

处理方式

我们要先知道在原始处理方式中是如何操作的。

在这个里面,最主要的操作就是 nms 操作,这里的 nms 操作就是一张图一张图的结果进行处理,不是多张图的结果一起处理,我们最主要的就是要修改这里的代码。

但是在这里,我们要先理解在原始的处理方式是怎样的逻辑。

原始 nms 逻辑

在这里就只给出关键步骤:

计算第4列到第mi列中的最大值,然后将这个最大值与conf_thres进行比较,得到一个布尔值结果。最终的输出是一个布尔张量,表示每一行是否存在大于conf_thres的最大值。

xc = prediction[:, 4:mi].amax(1) > conf_thres  # candidates

将原始预测框中的 xywh 转为 xyxy

prediction[..., :4] = xywh2xyxy(prediction[..., :4])

从原始结果中选择出为True的结果,得到初步的筛选结果

x = x[xc[xi]]  # confidence

分离出 标注框,类别,掩码

box, cls, mask = x.split((4, nc, nm), 1)

再次根据 cls 进行筛选,并拼接成新的推理结果

  1. conf, j = cls.max(1, keepdim=True)
  2. x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

计算nms

  1. boxes = x[:, :4] + c # boxes (offset by class)
  2. i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS

只选出前 max_det的输出结果,避免有多余的输出

i = i[:max_det]  # limit detections

现在 nms 逻辑

prediction 中所有批次的结果进行统一,处理成一个批次,在将这个批次送入到 batched_nms 中,最后在进行处理,得到标注框,类别,置信度

筛选出为true的索引(批次数)和行数

true_indices = torch.nonzero(xc)

根据索引和行数筛选出 prediction 中真实的结果,注意:这个结果是所有的批次的结果

selected_rows = prediction[true_indices[:, 0], true_indices[:, 1]]

将批次数添加在筛选出的结果中,用于区分出那个结果是那个批次的。注意:这个的批次也可以看成是对应的图片

new_prediction = torch.cat((selected_rows, true_indices[:, 0].unsqueeze(1).float()), dim=1)

分割出标注框、类别、掩码、索引(批次)

box, cls, mask, idxs = new_prediction.split((4, nc, nm, 1), 1)

筛选出最大的类别的置信和类别索引

conf, j = cls.max(1, keepdim=True)

根据类别置信度再次进行筛选,选出符合的结果,并进行拼接,得到一个新的结果

x = torch.cat((box, conf, j.float()), 1)[conf.squeeze(-1) > conf_thres]

将标注框,置信度,索引,iou值送入到 batched_nms 中,选出最终的预测结果的索引标签。

  1. cls = x[:, 5] # classes
  2. c = x[:, 5:6] * (0 if agnostic else max_wh)
  3. boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
  4. idxs = idxs.t().squeeze(0)
  5. keep = torchvision.ops.batched_nms(boxes, scores, idxs, iou_thres)

batched_nms:以批处理方式执行非最大值抑制。每个索引值对应一个类别,NMS不会应用于不同类别的元素之间。

参数:

boxes (Tensor[N, 4]):标注框,为 (x1, y1, x2, y2) 格式,其中 0 <= x1 < x20 <= y1 < y2

scores (Tensor[N]):每个标注框的得分

idxs (Tensor[N]):每个标注框的类别索引。

iou_threshold(float):剔除 IoU > iou_threshold 的所有重叠方框

返回值:

Tensor:int64,为 NMS 保留的元素索引,按分数递减排序

def batched_nms(boxes: Tensor, scores: Tensor, idxs: Tensor, iou_threshold: float,) -> Tensor:

根据 nms 的筛选结果,选择出最终的预测结果

  1. boxes[keep] = self.scale_boxes(inferShape, boxes[keep], orgShape)
  2. boxes = boxes[keep].cpu().numpy().tolist()
  3. scores = scores[keep].cpu().numpy().tolist()
  4. cls = cls[keep].cpu().numpy().tolist()
  5. idxs = idxs[keep].cpu().numpy().tolist()

修改后的代码

  1. def non_max_suppression(self, prediction, inferShape, orgShape, conf_thres=0.25, iou_thres=0.45, agnostic=True, multi_label=False,
  2. max_wh=7680, nc=0):
  3. prediction = prediction[0] # select only inference output
  4. nc = nc # number of classes
  5. nm = prediction.shape[1] - nc - 4
  6. mi = 4 + nc # mask start index
  7. xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates
  8. # Settings
  9. multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
  10. prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84)
  11. prediction[..., :4] = self.xywh2xyxy(prediction[..., :4]) # xywh to xyxy
  12. true_indices = torch.nonzero(xc)
  13. selected_rows = prediction[true_indices[:, 0], true_indices[:, 1]]
  14. new_prediction = torch.cat((selected_rows, true_indices[:, 0].unsqueeze(1).float()), dim=1)
  15. if new_prediction.shape[0] == 0:
  16. return
  17. box, cls, mask, idxs = new_prediction.split((4, nc, nm, 1), 1)
  18. conf, j = cls.max(1, keepdim=True)
  19. x = torch.cat((box, conf, j.float()), 1)[conf.squeeze(-1) > conf_thres]
  20. if not x.shape[0]: # no boxes
  21. return
  22. cls = x[:, 5] # classes
  23. c = x[:, 5:6] * (0 if agnostic else max_wh)
  24. boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
  25. idxs = idxs.t().squeeze(0)
  26. keep = torchvision.ops.batched_nms(boxes, scores, idxs, iou_thres)
  27. boxes[keep] = self.scale_boxes(inferShape, boxes[keep], orgShape)
  28. boxes = boxes[keep].cpu().numpy()
  29. scores = scores[keep].cpu().numpy()
  30. cls = cls[keep].cpu().numpy()
  31. idxs = idxs[keep].cpu().numpy()
  32. results = np.hstack((boxes, np.expand_dims(scores, axis=1)))
  33. results = np.hstack((results, np.expand_dims(cls, axis=1)))
  34. results = np.hstack((results, np.expand_dims(idxs, axis=1)))
  35. return results

三、完整代码

通过上面的解析,我们了解了 YOLO V8-Detection 如何进行批量的推理图片的方法,并对每一步进行了实现。

完整的推理代码如下:

  1. # -*- coding:utf-8 -*-
  2. # @author: 牧锦程
  3. # @微信公众号: AI算法与电子竞赛
  4. # @Email: m21z50c71@163.com
  5. # @VX:fylaicai
  6. import os.path
  7. import random
  8. import cv2
  9. import numpy as np
  10. import torch
  11. import torchvision
  12. from ultralytics.nn.autobackend import AutoBackend
  13. import torch.nn.functional as F
  14. class YOLOV8DetectionInfer:
  15. def __init__(self, weights, cuda, conf_thres, iou_thres) -> None:
  16. self.imgsz = 640
  17. self.device = cuda
  18. self.model = AutoBackend(weights, device=torch.device(cuda))
  19. self.model.eval()
  20. self.names = self.model.names
  21. self.conf = conf_thres
  22. self.iou = iou_thres
  23. self.color = {"font": (255, 255, 255)}
  24. self.color.update(
  25. {self.names[i]: (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
  26. for i in range(len(self.names))})
  27. def infer(self, img_path, save_path):
  28. img_src = cv2.imread(img_path)
  29. img_array = np.array([img_src])
  30. img = self.tensor_process(img_array)
  31. preds = self.model(img)
  32. results = self.non_max_suppression(preds, img.shape[2:], img_src.shape, self.conf, self.iou, nc=len(self.names))
  33. for result in results:
  34. self.draw_box(img_array[int(result[6])], result[:4], result[4], self.names[result[5]])
  35. for i in range(img_array.shape[0]):
  36. cv2.imwrite(os.path.join(save_path, f"{i}.jpg"), img_array[i])
  37. def draw_box(self, img_src, box, conf, cls_name):
  38. lw = max(round(sum(img_src.shape) / 2 * 0.003), 2) # line width
  39. tf = max(lw - 1, 1) # font thickness
  40. sf = lw / 3 # font scale
  41. color = self.color[cls_name]
  42. label = f'{cls_name} {conf:.4f}'
  43. p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3]))
  44. # 绘制矩形框
  45. cv2.rectangle(img_src, p1, p2, color, thickness=lw, lineType=cv2.LINE_AA)
  46. # text width, height
  47. w, h = cv2.getTextSize(label, 0, fontScale=sf, thickness=tf)[0]
  48. # label fits outside box
  49. outside = box[1] - h - 3 >= 0
  50. p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
  51. # 绘制矩形框填充
  52. cv2.rectangle(img_src, p1, p2, color, -1, cv2.LINE_AA)
  53. # 绘制标签
  54. cv2.putText(img_src, label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
  55. 0, sf, self.color["font"], thickness=2, lineType=cv2.LINE_AA)
  56. def tensor_process(self, image_cv):
  57. img_shape = image_cv.shape[1:]
  58. new_shape = [640, 640]
  59. r = min(new_shape[0] / img_shape[0], new_shape[1] / img_shape[1])
  60. # Compute padding
  61. new_unpad = int(round(img_shape[0] * r)), int(round(img_shape[1] * r))
  62. dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
  63. dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding
  64. dw /= 2 # divide padding into 2 sides
  65. dh /= 2
  66. top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
  67. left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
  68. padding_value = 114
  69. # Convert
  70. image_cv = image_cv[..., ::-1].transpose((0, 3, 1, 2)) # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
  71. image_cv = np.ascontiguousarray(image_cv) # contiguous
  72. image_tensor = torch.from_numpy(image_cv).float()
  73. image_tensor = image_tensor.to(self.device)
  74. resized_tensor = F.interpolate(image_tensor, size=new_unpad, mode='bilinear', align_corners=False)
  75. padded_tensor = F.pad(resized_tensor, (top, bottom, left, right), mode='constant', value=padding_value)
  76. infer_tensor = padded_tensor / 255.0
  77. return infer_tensor
  78. def non_max_suppression(self, prediction, inferShape, orgShape, conf_thres=0.25, iou_thres=0.45, agnostic=True, multi_label=False,
  79. max_wh=7680, nc=0):
  80. prediction = prediction[0] # select only inference output
  81. nc = nc # number of classes
  82. nm = prediction.shape[1] - nc - 4
  83. mi = 4 + nc # mask start index
  84. xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates
  85. # Settings
  86. multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
  87. prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84)
  88. prediction[..., :4] = self.xywh2xyxy(prediction[..., :4]) # xywh to xyxy
  89. true_indices = torch.nonzero(xc)
  90. selected_rows = prediction[true_indices[:, 0], true_indices[:, 1]]
  91. new_prediction = torch.cat((selected_rows, true_indices[:, 0].unsqueeze(1).float()), dim=1)
  92. if new_prediction.shape[0] == 0:
  93. return
  94. box, cls, mask, idxs = new_prediction.split((4, nc, nm, 1), 1)
  95. conf, j = cls.max(1, keepdim=True)
  96. x = torch.cat((box, conf, j.float()), 1)[conf.squeeze(-1) > conf_thres]
  97. if not x.shape[0]: # no boxes
  98. return
  99. cls = x[:, 5] # classes
  100. c = x[:, 5:6] * (0 if agnostic else max_wh)
  101. boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
  102. idxs = idxs.t().squeeze(0)
  103. keep = torchvision.ops.batched_nms(boxes, scores, idxs, iou_thres)
  104. boxes[keep] = self.scale_boxes(inferShape, boxes[keep], orgShape)
  105. boxes = boxes[keep].cpu().numpy()
  106. scores = scores[keep].cpu().numpy()
  107. cls = cls[keep].cpu().numpy()
  108. idxs = idxs[keep].cpu().numpy()
  109. results = np.hstack((boxes, np.expand_dims(scores, axis=1)))
  110. results = np.hstack((results, np.expand_dims(cls, axis=1)))
  111. results = np.hstack((results, np.expand_dims(idxs, axis=1)))
  112. return results
  113. def xywh2xyxy(self, x):
  114. assert x.shape[-1] == 4, f"input shape last dimension expected 4 but input shape is {x.shape}"
  115. y = torch.empty_like(x) if isinstance(x, torch.Tensor) else np.empty_like(x) # faster than clone/copy
  116. dw = x[..., 2] / 2 # half-width
  117. dh = x[..., 3] / 2 # half-height
  118. y[..., 0] = x[..., 0] - dw # top left x
  119. y[..., 1] = x[..., 1] - dh # top left y
  120. y[..., 2] = x[..., 0] + dw # bottom right x
  121. y[..., 3] = x[..., 1] + dh # bottom right y
  122. return y
  123. def clip_boxes(self, boxes, shape):
  124. if isinstance(boxes, torch.Tensor): # faster individually (WARNING: inplace .clamp_() Apple MPS bug)
  125. boxes[..., 0] = boxes[..., 0].clamp(0, shape[1]) # x1
  126. boxes[..., 1] = boxes[..., 1].clamp(0, shape[0]) # y1
  127. boxes[..., 2] = boxes[..., 2].clamp(0, shape[1]) # x2
  128. boxes[..., 3] = boxes[..., 3].clamp(0, shape[0]) # y2
  129. else: # np.array (faster grouped)
  130. boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1]) # x1, x2
  131. boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0]) # y1, y2
  132. return boxes
  133. def scale_boxes(self, img1_shape, boxes, img0_shape, ratio_pad=None, padding=True, xywh=False):
  134. if ratio_pad is None: # calculate from img0_shape
  135. gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new
  136. pad = (
  137. round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1),
  138. round((img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1),
  139. ) # wh padding
  140. else:
  141. gain = ratio_pad[0][0]
  142. pad = ratio_pad[1]
  143. if padding:
  144. boxes[..., 0] -= pad[0] # x padding
  145. boxes[..., 1] -= pad[1] # y padding
  146. if not xywh:
  147. boxes[..., 2] -= pad[0] # x padding
  148. boxes[..., 3] -= pad[1] # y padding
  149. boxes[..., :4] /= gain
  150. return self.clip_boxes(boxes, img0_shape)
  151. if __name__ == '__main__':
  152. weights = r'yolov8n.pt'
  153. cuda = 'cuda:0'
  154. save_path = "./runs"
  155. if not os.path.exists(save_path):
  156. os.mkdir(save_path)
  157. model = YOLOV8DetectionInfer(weights, cuda, 0.25, 0.45)
  158. img_path = r'./ultralytics/assets/bus.jpg'
  159. model.infer(img_path, save_path)

四、书籍推荐

推荐一本书籍:《深度学习图解》

关于本书:《深度学习图解》旨在帮助你在深度学习领域打下基础 ,以便能 够从更高层面掌握深度学习的主要框架。它从关注神经网络的基础概念开始 , 然后深入讲解那些更高级的网络设计和架构。

目标读者:阅读本书,不需要提前掌握线性代数、微积分、凸优化甚至机器学习等任何知识。理解深度学习所需的一切知识都会在本书的阅读过程中得到解释。如果你学过高中数学,并且能够使用 Python 编程,那么己经为阅读本书做好了准备。

关注下方公众号:@AI算法与电子竞赛,回复关键字“PDF”获取下载地址

五、链接作者

欢迎关注我的公众号:@AI算法与电子竞赛

硬性的标准其实限制不了无限能的我们,所以啊!少年们加油吧!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小惠珠哦/article/detail/798748
推荐阅读
相关标签
  

闽ICP备14008679号