凡人多烦事01

这个屌丝很懒，什么也没留下！

热门标签

YOLOv8-OBB推理详解及部署实现

作者：凡人多烦事01 | 2024-03-17 17:18:11

踩

yolov8-obb

前言

梳理下 YOLOv8-OBB 的预处理和后处理流程，顺便让 tensorRT_Pro 支持 YOLOv8

注：为了不必要的错误，下面我们以 YOLOv8 的固定版本 v8.1.0 来演示说明

参考：https://github.com/shouxieai/tensorRT_Pro

实现：https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8

一、YOLOv8-OBB推理(Python)

1. YOLOv8-OBB预测

我们先尝试利用官方预训练权重来推理一张图片并保存，看能否成功

在 YOLOv8 主目录下新建 predict-obb.py 预测文件，其内容如下：

import cv2
import torch
import numpy as np
from ultralytics import YOLO

def xywhr2xyxyxyxy(center):
    # reference: https://github.com/ultralytics/ultralytics/blob/v8.1.0/ultralytics/utils/ops.py#L545
    is_numpy = isinstance(center, np.ndarray)
    cos, sin = (np.cos, np.sin) if is_numpy else (torch.cos, torch.sin)

    ctr = center[..., :2]
    w, h, angle = (center[..., i : i + 1] for i in range(2, 5))
    cos_value, sin_value = cos(angle), sin(angle)
    vec1 = [w / 2 * cos_value, w / 2 * sin_value]
    vec2 = [-h / 2 * sin_value, h / 2 * cos_value]
    vec1 = np.concatenate(vec1, axis=-1) if is_numpy else torch.cat(vec1, dim=-1)
    vec2 = np.concatenate(vec2, axis=-1) if is_numpy else torch.cat(vec2, dim=-1)
    pt1 = ctr + vec1 + vec2
    pt2 = ctr + vec1 - vec2
    pt3 = ctr - vec1 - vec2
    pt4 = ctr - vec1 + vec2
    return np.stack([pt1, pt2, pt3, pt4], axis=-2) if is_numpy else torch.stack([pt1, pt2, pt3, pt4], dim=-2)

def hsv2bgr(h, s, v):
    h_i = int(h * 6)
    f = h * 6 - h_i
    p = v * (1 - s)
    q = v * (1 - f * s)
    t = v * (1 - (1 - f) * s)
    
    r, g, b = 0, 0, 0

    if h_i == 0:
        r, g, b = v, t, p
    elif h_i == 1:
        r, g, b = q, v, p
    elif h_i == 2:
        r, g, b = p, v, t
    elif h_i == 3:
        r, g, b = p, q, v
    elif h_i == 4:
        r, g, b = t, p, v
    elif h_i == 5:
        r, g, b = v, p, q

    return int(b * 255), int(g * 255), int(r * 255)

def random_color(id):
    h_plane = (((id << 2) ^ 0x937151) % 100) / 100.0
    s_plane = (((id << 3) ^ 0x315793) % 100) / 100.0
    return hsv2bgr(h_plane, s_plane, 1)

if __name__ == "__main__":

    model = YOLO("yolov8s-obb.pt")

    img = cv2.imread("P0032.jpg")
    results = model(img)[0]
    names   = results.names
    boxes   = results.obb.data.cpu()
    confs   = boxes[..., 5].tolist()
    classes = list(map(int, boxes[..., 6].tolist()))
    boxes   = xywhr2xyxyxyxy(boxes[..., :5])
    
    for i, box in enumerate(boxes):
        confidence = confs[i]
        label = classes[i]
        color = random_color(label)
        cv2.polylines(img, [np.asarray(box, dtype=int)], True, color, 2)
        caption = f"{names[label]} {confidence:.2f}"
        w, h = cv2.getTextSize(caption, 0 ,1, 2)[0]
        left, top = [int(b) for b in box[0]]
        cv2.rectangle(img, (left - 3, top - 33), (left + w + 10, top), color, -1)
        cv2.putText(img, caption, (left, top - 5), 0, 1, (0, 0, 0), 2, 16)

    cv2.imwrite("predict-obb.jpg", img)
    print("save done")        
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

在上述代码中我们通过 opencv 读取了一张图像，并送入模型中推理得到输出 results，results 中保存着不同任务的结果，我们这里是旋转目标检测任务，因此只需要拿到对应的旋转框 boxes 即可。

拿到 boxes 后我们就可以将对应的旋转框和模型预测的类别以及置信度绘制在图像上并保存。

关于可视化的代码实现参考自 tensorRT_Pro 中的实现，可以参考：app_yolo.cpp#L95

关于随机颜色的代码实现参考自 tensorRT_Pro 中的实现，可以参考：ilogger.cpp#L90

模型推理保存的结果图像如下所示：

在这里插入图片描述

2. YOLOv8-OBB预处理

模型预测成功后我们就需要自己动手来写下 YOLOv8-OBB 的预处理和后处理，方便后续在 C++ 上的实现，我们先来看看预处理的实现。

经过我们的调试分析可知 YOLOv8-OBB 的预处理过程在 ultralytics/engine/predictor.py 文件中，可以参考：predictor.py#L113

代码如下：

def preprocess(self, im):
    """
    Prepares input image before inference.

    Args:
        im (torch.Tensor | List(np.ndarray)): BCHW for tensor, [(HWC) x B] for list.
    """
    not_tensor = not isinstance(im, torch.Tensor)
    if not_tensor:
        im = np.stack(self.pre_transform(im))
        im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
        im = np.ascontiguousarray(im)  # contiguous
        im = torch.from_numpy(im)

    im = im.to(self.device)
    im = im.half() if self.model.fp16 else im.float()  # uint8 to fp16/32
    if not_tensor:
        im /= 255  # 0 - 255 to 0.0 - 1.0
    return im
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

它包含以下步骤：

self.pre_transform：即 letterbox 添加灰条
im[…,::-1]：BGR → RGB
transpose((0, 3, 1, 2))：添加 batch 维度，HWC → CHW
torch.from_numpy：to Tensor
im /= 255：除以 255，归一化

大家如果对 YOLOv5 的预处理熟悉的话，会发现 YOLOv8-OBB 的预处理和 YOLOv5 的预处理一模一样，因此我们不难写出对应的预处理代码，如下所示：

def preprocess_warpAffine(image, dst_width=1024, dst_height=1024):
    scale = min((dst_width / image.shape[1], dst_height / image.shape[0]))
    ox = (dst_width  - scale * image.shape[1]) / 2
    oy = (dst_height - scale * image.shape[0]) / 2
    M = np.array([
        [scale, 0, ox],
        [0, scale, oy]
    ], dtype=np.float32)
    
    img_pre = cv2.warpAffine(image, M, (dst_width, dst_height), flags=cv2.INTER_LINEAR,
                             borderMode=cv2.BORDER_CONSTANT, borderValue=(114, 114, 114))
    IM = cv2.invertAffineTransform(M)
    img_pre = (img_pre[...,::-1] / 255.0).astype(np.float32)
    img_pre = img_pre.transpose(2, 0, 1)[None]
    img_pre = torch.from_numpy(img_pre)
    return img_pre, IM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

其中的 letterbox 添加灰条步骤我们可以通过仿射变换 warpAffine 实现，warpAffine 非常适合在 CUDA 上加速，关于 warpAffine 仿射变换的细节大家可以参考 YOLOv5推理详解及预处理高性能实现，这边不再赘述。其它步骤倒是和官方的没有区别。

值得注意的是，letterbox 的操作是先将长边缩放到 1024，再将短边按比例缩放，同时确保缩放后的短边能整除 32，如果不能则向上取整多余部分填充。warpAffine 的操作则是将图像分辨率固定在 1024x1024，多余部分添加灰条，博主对一张 1689x2425 分辨率的图像经过两种不同预处理后的结果进行了对比，如下图所示：

在这里插入图片描述

图1-1 LeeterBox预处理图像

在这里插入图片描述

图1-2 warpAffine预处理图像

可以看到二者明显的差别，letterbox 中灰条只有小部分，因为长边缩放到 1024 后短边缩放到 713，然后短板需向上整除 32，最终缩放到 736。而 warpAffine 则是固定分辨率 1024x1024，因此短边多余部分全部将用灰条填充。

warpAffine 预处理方法将图像分辨率固定在 1024x1024，主要有以下几点考虑：(from chatGPT)

简化处理逻辑：所有预处理后的图像分辨率相同，可以简化 CUDA 中并行处理的逻辑，使得代码更易于编写和维护。
优化内存访问：在 GPU 上，连续的内存访问模式通常比非连续的访问更高效。如果所有图像具有相同的大小和布局，这可以帮助优化内存访问，提高处理速度。
避免动态内存分配：动态内存分配和释放是昂贵的操作，特别是在 GPU 上。固定分辨率意味着可以预先分配足够的内存，而不需要根据每个图像的大小动态调整内存大小。

这两种不同的预处理方法生成的图片输入到神经网络时的维度不同，letterbox 的输入是 torch.Size([1, 3, 736, 1024])，warpAffine 的输入是 torch.Size([1, 3, 1024, 1024])。由于输入维度不同将导致模型输出维度的差异，leetrbox 的输出是 torch.Size([1, 20, 15456]) 只有 15456 个框，而 warpAffine 的输出是 torch.Size([1, 20, 21504]) 有 21504 个框，这点大家需要清楚。

3. YOLOv8-OBB后处理

我们再来看看后处理的实现

经过我们的调试分析可知 YOLOv8-OBB 的后处理过程在 ultralytics/models/yolo/obb/predict.py 文件中，可以参考：obb/predict.py#L10

class OBBPredictor(DetectionPredictor):
    """
    A class extending the DetectionPredictor class for prediction based on an Oriented Bounding Box (OBB) model.

    Example:
        ```python
        from ultralytics.utils import ASSETS
        from ultralytics.models.yolo.obb import OBBPredictor

        args = dict(model='yolov8n-obb.pt', source=ASSETS)
        predictor = OBBPredictor(overrides=args)
        predictor.predict_cli()
        
    """

    def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
        """Initializes OBBPredictor with optional model and data configuration overrides."""
        super().__init__(cfg, overrides, _callbacks)
        self.args.task = "obb"

    def postprocess(self, preds, img, orig_imgs):
        """Post-processes predictions and returns a list of Results objects."""
        preds = ops.non_max_suppression(
            preds,
            self.args.conf,
            self.args.iou,
            agnostic=self.args.agnostic_nms,
            max_det=self.args.max_det,
            nc=len(self.model.names),
            classes=self.args.classes,
            rotated=True,
        )

        if not isinstance(orig_imgs, list):  # input images are a torch.Tensor, not a list
            orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)

        results = []
        for i, (pred, orig_img, img_path) in enumerate(zip(preds, orig_imgs, self.batch[0])):
            pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape, xywh=True)
            # xywh, r, conf, cls
            obb = torch.cat([pred[:, :4], pred[:, -1:], pred[:, 4:6]], dim=-1)
            results.append(Results(orig_img, path=img_path, names=self.model.names, obb=obb))
        return results
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

它包含以下步骤：

ops.non_max_suppression：非极大值抑制，即 NMS
ops.scale_boxes：框的解码，即 decode boxes

大家如果对 YOLOv5 的后处理熟悉的话，会发现 YOLOv8-OBB 的后处理和 YOLOv5 的后处理基本相似，为什么说基本相似呢，是因为 YOLOv8-OBB 是基于旋转框的，在 IoU 的计算以及框的解码上有略微差异，因此我们不难写出对应的后处理代码，如下所示：

def probiou(obb1, obb2, eps=1e-7):
    # Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    def covariance_matrix(obb):
        # Extract elements
        w, h, r = obb[2:5]
        a = (w ** 2) / 12
        b = (h ** 2) / 12

        cos_r = torch.cos(torch.tensor(r))
        sin_r = torch.sin(torch.tensor(r))
        
        # Calculate covariance matrix elements
        a_val = a * cos_r ** 2 + b * sin_r ** 2
        b_val = a * sin_r ** 2 + b * cos_r ** 2
        c_val = (a - b) * sin_r * cos_r

        return a_val, b_val, c_val

    a1, b1, c1 = covariance_matrix(obb1)
    a2, b2, c2 = covariance_matrix(obb2)

    x1, y1 = obb1[:2]
    x2, y2 = obb2[:2]

    t1 = ((a1 + a2) * ((y1 - y2) ** 2) + (b1 + b2) * ((x1 - x2) ** 2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t2 = ((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t3 = torch.log(((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2) / (4 * torch.sqrt(a1 * b1 - c1 ** 2) * torch.sqrt(a2 * b2 - c2 ** 2) + eps) + eps)

    bd = 0.25 * t1 + 0.5 * t2 + 0.5 * t3
    hd = torch.sqrt(1.0 - torch.exp(-torch.clamp(bd, eps, 100.0)) + eps)
    return 1 - hd

def NMS(boxes, iou_thres):

    remove_flags = [False] * len(boxes)

    keep_boxes = []
    for i, ibox in enumerate(boxes):
        if remove_flags[i]:
            continue

        keep_boxes.append(ibox)
        for j in range(i + 1, len(boxes)):
            if remove_flags[j]:
                continue

            jbox = boxes[j]
            if(ibox[6] != jbox[6]):
                continue
            if probiou(ibox, jbox) > iou_thres:
                remove_flags[j] = True
    return keep_boxes

def postprocess(pred, IM=[], conf_thres=0.25, iou_thres=0.45):

    # 输入是模型推理的结果，即21504个预测框
    # 1,21504,20 [cx,cy,w,h,class*15,rotated]
    boxes = []
    for item in pred[0]:
        cx, cy, w, h = item[:4]
        angle = item[-1]
        label = item[4:-1].argmax()
        confidence = item[4 + label]
        if confidence < conf_thres:
            continue
        boxes.append([cx, cy, w, h, angle, confidence, label])

    boxes = np.array(boxes)
    cx = boxes[:, 0]
    cy = boxes[:, 1]
    wh = boxes[:, 2:4]
    boxes[:, 0] = IM[0][0] * cx + IM[0][2]
    boxes[:, 1] = IM[1][1] * cy + IM[1][2]
    boxes[:, 2:4] = IM[0][0] * wh
    boxes = sorted(boxes.tolist(), key=lambda x:x[5], reverse=True)
    
    return NMS(boxes, iou_thres)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

其中预测框的解码我们是通过仿射变换逆矩阵 IM 实现的，关于 IM 的细节大家可以参考 YOLOv5推理详解及预处理高性能实现，这边不再赘述。关于 NMS 的代码参考自 tensorRT_Pro 中的实现：yolo.cpp#L119

值得注意的是 IoU 的计算 YOLOv8 官方考虑的是利用 ProbIoU 来计算两个旋转框的相似性，更多细节大家可以看论文：Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection

对于一张 1024x1024 的图片来说，YOLOv8-OBB 预测框的总数量是 21504，每个预测框的维度是 20（针对 DOTAv1 数据集的 15 个类别而言）

\begin{aligned} 21504 \times 20 & = 128 \times 128 \times 20 + 64 \times 64 \times 20 + 32 \times 32 \times 20 \\ = 128 \times 128 \times (4 + 15 + 1) + 64 \times 64 \times (4 + 15 + 1) + 32 \times 32 \times (4 + 15 + 1) \end{aligned}

$\begin{aligned} 21504\times20&=128\times128\times20+64\times64\times20+32\times32\times20\\ &=128\times128\times(4+15+1)+64\times64\times(4+15+1)+32\times32\times(4+15+1) \end{aligned}$

21504 \times 20 = 128 \times 128 \times 20 + 64 \times 64 \times 20 + 32 \times 32 \times 20 = 128 \times 128 \times (4 + 15 + 1) + 64 \times 64 \times (4 + 15 + 1) + 32 \times 32 \times (4 + 15 + 1)

其中的 4 对应的是 cx, cy, w, h，分别代表的含义是边界框中心点坐标、宽高；15 对应的是 DOTAv1 数据集中的 15 个类别置信度；1 对应的是旋转框的旋转角度 angle，其取值范围是在 [-pi/4, 3pi/4] 之间。

4. YOLOv8-OBB推理

通过上面对 YOLOv8-OBB 的预处理和后处理分析之后，整个推理过程就显而易见了。YOLOv8-OBB 的推理包括图像预处理、模型推理、预测结果后处理三部分，其中预处理主要包括 warpAffine 仿射变换，后处理主要包括 decode 解码和 NMS 两部分。

完整的推理代码如下：

import cv2
import torch
import numpy as np
from ultralytics.data.augment import LetterBox
from ultralytics.nn.autobackend import AutoBackend

def preprocess_letterbox(image):
    letterbox = LetterBox(new_shape=1024, stride=32, auto=True)
    image = letterbox(image=image)
    image = (image[..., ::-1] / 255.0).astype(np.float32) # BGR to RGB, 0 - 255 to 0.0 - 1.0
    image = image.transpose(2, 0, 1)[None]  # BHWC to BCHW (n, 3, h, w)
    image = torch.from_numpy(image)
    return image

def preprocess_warpAffine(image, dst_width=1024, dst_height=1024):
    scale = min((dst_width / image.shape[1], dst_height / image.shape[0]))
    ox = (dst_width  - scale * image.shape[1]) / 2
    oy = (dst_height - scale * image.shape[0]) / 2
    M = np.array([
        [scale, 0, ox],
        [0, scale, oy]
    ], dtype=np.float32)
    img_pre = cv2.warpAffine(image, M, (dst_width, dst_height), flags=cv2.INTER_LINEAR,
                             borderMode=cv2.BORDER_CONSTANT, borderValue=(114, 114, 114))
    IM = cv2.invertAffineTransform(M)
    img_pre = (img_pre[...,::-1] / 255.0).astype(np.float32)
    img_pre = img_pre.transpose(2, 0, 1)[None]
    img_pre = torch.from_numpy(img_pre)
    return img_pre, IM

def xywhr2xyxyxyxy(center):
    # reference: https://github.com/ultralytics/ultralytics/blob/v8.1.0/ultralytics/utils/ops.py#L545
    is_numpy = isinstance(center, np.ndarray)
    cos, sin = (np.cos, np.sin) if is_numpy else (torch.cos, torch.sin)

    ctr = center[..., :2]
    w, h, angle = (center[..., i : i + 1] for i in range(2, 5))
    cos_value, sin_value = cos(angle), sin(angle)
    vec1 = [w / 2 * cos_value, w / 2 * sin_value]
    vec2 = [-h / 2 * sin_value, h / 2 * cos_value]
    vec1 = np.concatenate(vec1, axis=-1) if is_numpy else torch.cat(vec1, dim=-1)
    vec2 = np.concatenate(vec2, axis=-1) if is_numpy else torch.cat(vec2, dim=-1)
    pt1 = ctr + vec1 + vec2
    pt2 = ctr + vec1 - vec2
    pt3 = ctr - vec1 - vec2
    pt4 = ctr - vec1 + vec2
    return np.stack([pt1, pt2, pt3, pt4], axis=-2) if is_numpy else torch.stack([pt1, pt2, pt3, pt4], dim=-2)

def probiou(obb1, obb2, eps=1e-7):
    # Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    def covariance_matrix(obb):
        # Extract elements
        w, h, r = obb[2:5]
        a = (w ** 2) / 12
        b = (h ** 2) / 12

        cos_r = torch.cos(torch.tensor(r))
        sin_r = torch.sin(torch.tensor(r))
        
        # Calculate covariance matrix elements
        a_val = a * cos_r ** 2 + b * sin_r ** 2
        b_val = a * sin_r ** 2 + b * cos_r ** 2
        c_val = (a - b) * sin_r * cos_r

        return a_val, b_val, c_val

    a1, b1, c1 = covariance_matrix(obb1)
    a2, b2, c2 = covariance_matrix(obb2)

    x1, y1 = obb1[:2]
    x2, y2 = obb2[:2]

    t1 = ((a1 + a2) * ((y1 - y2) ** 2) + (b1 + b2) * ((x1 - x2) ** 2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t2 = ((c1 + c2) * (x2 - x1) * (y1 - y2)) / ((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2 + eps)
    t3 = torch.log(((a1 + a2) * (b1 + b2) - (c1 + c2) ** 2) / (4 * torch.sqrt(a1 * b1 - c1 ** 2) * torch.sqrt(a2 * b2 - c2 ** 2) + eps) + eps)

    bd = 0.25 * t1 + 0.5 * t2 + 0.5 * t3
    hd = torch.sqrt(1.0 - torch.exp(-torch.clamp(bd, eps, 100.0)) + eps)
    return 1 - hd

def NMS(boxes, iou_thres):

    remove_flags = [False] * len(boxes)

    keep_boxes = []
    for i, ibox in enumerate(boxes):
        if remove_flags[i]:
            continue

        keep_boxes.append(ibox)
        for j in range(i + 1, len(boxes)):
            if remove_flags[j]:
                continue

            jbox = boxes[j]
            if(ibox[6] != jbox[6]):
                continue
            if probiou(ibox, jbox) > iou_thres:
                remove_flags[j] = True
    return keep_boxes

def postprocess(pred, IM=[], conf_thres=0.25, iou_thres=0.45):

    # 输入是模型推理的结果，即21504个预测框
    # 1,21504,20 [cx,cy,w,h,class*15,rotated]
    boxes = []
    for item in pred[0]:
        cx, cy, w, h = item[:4]
        angle = item[-1]
        label = item[4:-1].argmax()
        confidence = item[4 + label]
        if confidence < conf_thres:
            continue
        boxes.append([cx, cy, w, h, angle, confidence, label])

    boxes = np.array(boxes)
    cx = boxes[:, 0]
    cy = boxes[:, 1]
    wh = boxes[:, 2:4]
    boxes[:, 0] = IM[0][0] * cx + IM[0][2]
    boxes[:, 1] = IM[1][1] * cy + IM[1][2]
    boxes[:, 2:4] = IM[0][0] * wh
    boxes = sorted(boxes.tolist(), key=lambda x:x[5], reverse=True)
    
    return NMS(boxes, iou_thres)

def hsv2bgr(h, s, v):
    h_i = int(h * 6)
    f = h * 6 - h_i
    p = v * (1 - s)
    q = v * (1 - f * s)
    t = v * (1 - (1 - f) * s)
    
    r, g, b = 0, 0, 0

    if h_i == 0:
        r, g, b = v, t, p
    elif h_i == 1:
        r, g, b = q, v, p
    elif h_i == 2:
        r, g, b = p, v, t
    elif h_i == 3:
        r, g, b = p, q, v
    elif h_i == 4:
        r, g, b = t, p, v
    elif h_i == 5:
        r, g, b = v, p, q

    return int(b * 255), int(g * 255), int(r * 255)

def random_color(id):
    h_plane = (((id << 2) ^ 0x937151) % 100) / 100.0
    s_plane = (((id << 3) ^ 0x315793) % 100) / 100.0
    return hsv2bgr(h_plane, s_plane, 1)

if __name__ == "__main__":

    img = cv2.imread("P0032.jpg")

    # img_pre = preprocess_letterbox(img)
    img_pre, IM = preprocess_warpAffine(img)
    model  = AutoBackend(weights="yolov8s-obb.pt")
    names  = model.names
    result = model(img_pre)[0].transpose(-1, -2)  # 1,21504,20

    boxes   = postprocess(result, IM)
    confs   = [box[5] for box in boxes]
    classes = [int(box[6]) for box in boxes]
    boxes   = xywhr2xyxyxyxy(np.array(boxes)[..., :5])

    for i, box in enumerate(boxes):
        confidence = confs[i]
        label = classes[i]
        color = random_color(label)
        cv2.polylines(img, [np.asarray(box, dtype=int)], True, color, 2)
        caption = f"{names[label]} {confidence:.2f}"
        w, h = cv2.getTextSize(caption, 0 ,1, 2)[0]
        left, top = [int(b) for b in box[0]]
        cv2.rectangle(img, (left - 3, top - 33), (left + w + 10, top), color, -1)
        cv2.putText(img, caption, (left, top - 5), 0, 1, (0, 0, 0), 2, 16)
    
    cv2.imwrite("infer-obb.jpg", img)
    print("save done")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183

推理效果如下图：

在这里插入图片描述

至此，我们在 Python 上面完成了 YOLOv8-OBB 的整个推理过程，下面我们去 C++ 上实现。

二、YOLOv8-OBB推理(C++)

C++ 上的实现我们使用的 repo 依旧是 tensorRT_Pro，现在我们就基于 tensorRT_Pro 完成 YOLOv8-OBB 在 C++ 上的推理。

1. ONNX导出

首先我们需要将 YOLOv8-OBB 模型导出为 ONNX，为了适配 tensorRT_Pro 我们需要做一些修改，主要有以下几点：

修改输出节点名为 output，输入输出只让 batch 维度动态，宽高不动态
增加 transpose 节点交换输出的 2、3 维度

具体修改如下：

1. 在 ultralytics/engine/exporter.py 文件中改动一处

353 行：输出节点名修改为 output
356 行：输入只让 batch 维度动态，宽高不动态
361 行：输出只让 batch 维度动态，宽高不动态

# ========== exporter.py ==========

# ultralytics/engine/exporter.py第353行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
#     dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}}  # shape(1,3,640,640)
#     if isinstance(self.model, SegmentationModel):
#         dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 116, 8400)
#         dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'}  # shape(1,32,160,160)
#     elif isinstance(self.model, DetectionModel):
#         dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 84, 8400)
# 修改为：

output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
    dynamic = {'images': {0: 'batch'}}  # shape(1,3,640,640)
    if isinstance(self.model, SegmentationModel):
        dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 116, 8400)
        dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'}  # shape(1,32,160,160)
    elif isinstance(self.model, DetectionModel):
        dynamic['output'] = {0: 'batch'}  # shape(1, 84, 8400)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

2. 在 ultralytics/nn/modules/head.py 文件中改动一处

141 行：添加 transpose 节点交换输出的第 2 和第 3 维度

# ========== head.py ==========

# ultralytics/nn/modules/head.py第141行，forward函数
# return torch.cat([x, angle], 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
# 修改为：

return torch.cat([x, angle], 1).permute(0, 2, 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
1
2
3
4
5
6
7

以上就是为了适配 tensorRT_Pro 而做出的代码修改，修改好以后，将预训练权重 yolov8s-obb.pt 放在 ultralytics-main 主目录下，新建导出文件 export.py，内容如下：

from ultralytics import YOLO

model = YOLO("yolov8s-obb.pt")

success = model.export(format="onnx", dynamic=True, simplify=True)
1
2
3
4
5

在终端执行如下指令即可完成 onnx 导出：

python export.py
1

导出过程如下图所示：

在这里插入图片描述

可以看到导出的 pytorch 模型的输入 shape 是 1x3x1024x1024，输出 shape 是 1x21504x20，符合我们的预期。

导出成功后会在当前目录下生成 yolov8s-obb.onnx 模型，我们可以使用 Netron 可视化工具查看，如下图所示：

在这里插入图片描述

可以看到输入节点名是 images，维度是 batchx3x1024x1024，保证只有 batch 维度动态，输出节点名是 output，维度是 batchxTransposeoutput_dim_1xTransposeoutput_dim_2，保证只有 batch 维度动态，符合 tensorRT_Pro 的格式。

大家不要看到 Transposeoutput_dim_1 和 Transposeoutput_dim_2 就认为这也是动态的，其实输出节点的维度是根据输入节点的维度和模型的结构生成的，而额外的维度 Transposeoutput_dim_1 和 Transposeoutput_dim_2 可能是由模型结构中某些操作决定的，如通道数变换（Transpose）操作的输出维度，而不是由动态维度决定的。因此，通常情况下，这些维度是静态的，不会在推理时改变。

2. YOLOv8-OBB预处理

之前有提到过 YOLOv8-OBB 预处理部分和 YOLOv5 实现一模一样，因此我们在 tensorRT_Pro 中 YOLOv8-OBB 模型的预处理可以直接使用 YOLOv5 的预处理。

tensorRT_Pro 中预处理的代码如下：

__global__ void warp_affine_bilinear_and_normalize_plane_kernel(uint8_t* src, int src_line_size, int src_width, int src_height, float* dst, int dst_width, int dst_height, 
	uint8_t const_value_st, float* warp_affine_matrix_2_3, Norm norm, int edge){

	int position = blockDim.x * blockIdx.x + threadIdx.x;
	if (position >= edge) return;

	float m_x1 = warp_affine_matrix_2_3[0];
	float m_y1 = warp_affine_matrix_2_3[1];
	float m_z1 = warp_affine_matrix_2_3[2];
	float m_x2 = warp_affine_matrix_2_3[3];
	float m_y2 = warp_affine_matrix_2_3[4];
	float m_z2 = warp_affine_matrix_2_3[5];

	int dx      = position % dst_width;
	int dy      = position / dst_width;
	float src_x = m_x1 * dx + m_y1 * dy + m_z1;
	float src_y = m_x2 * dx + m_y2 * dy + m_z2;
	float c0, c1, c2;

	if(src_x <= -1 || src_x >= src_width || src_y <= -1 || src_y >= src_height){
		// out of range
		c0 = const_value_st;
		c1 = const_value_st;
		c2 = const_value_st;
	}else{
		int y_low = floorf(src_y);
		int x_low = floorf(src_x);
		int y_high = y_low + 1;
		int x_high = x_low + 1;

		uint8_t const_value[] = {const_value_st, const_value_st, const_value_st};
		float ly    = src_y - y_low;
		float lx    = src_x - x_low;
		float hy    = 1 - ly;
		float hx    = 1 - lx;
		float w1    = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
		uint8_t* v1 = const_value;
		uint8_t* v2 = const_value;
		uint8_t* v3 = const_value;
		uint8_t* v4 = const_value;
		if(y_low >= 0){
			if (x_low >= 0)
				v1 = src + y_low * src_line_size + x_low * 3;

			if (x_high < src_width)
				v2 = src + y_low * src_line_size + x_high * 3;
		}
		
		if(y_high < src_height){
			if (x_low >= 0)
				v3 = src + y_high * src_line_size + x_low * 3;

			if (x_high < src_width)
				v4 = src + y_high * src_line_size + x_high * 3;
		}
		
		// same to opencv
		c0 = floorf(w1 * v1[0] + w2 * v2[0] + w3 * v3[0] + w4 * v4[0] + 0.5f);
		c1 = floorf(w1 * v1[1] + w2 * v2[1] + w3 * v3[1] + w4 * v4[1] + 0.5f);
		c2 = floorf(w1 * v1[2] + w2 * v2[2] + w3 * v3[2] + w4 * v4[2] + 0.5f);
	}

	if(norm.channel_type == ChannelType::Invert){
		float t = c2;
		c2 = c0;  c0 = t;
	}

	if(norm.type == NormType::MeanStd){
		c0 = (c0 * norm.alpha - norm.mean[0]) / norm.std[0];
		c1 = (c1 * norm.alpha - norm.mean[1]) / norm.std[1];
		c2 = (c2 * norm.alpha - norm.mean[2]) / norm.std[2];
	}else if(norm.type == NormType::AlphaBeta){
		c0 = c0 * norm.alpha + norm.beta;
		c1 = c1 * norm.alpha + norm.beta;
		c2 = c2 * norm.alpha + norm.beta;
	}

	int area = dst_width * dst_height;
	float* pdst_c0 = dst + dy * dst_width + dx;
	float* pdst_c1 = pdst_c0 + area;
	float* pdst_c2 = pdst_c1 + area;
	*pdst_c0 = c0;
	*pdst_c1 = c1;
	*pdst_c2 = c2;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

关于预处理部分其实就是调用了上述 CUDA 核函数来实现 warpAffine，由于在 CUDA 中我们是对每个像素进行操作，因此非常容易实现 BGR → RGB，/255.0 等操作。关于代码的具体分析可以参考 YOLOv5推理详解及预处理高性能实现，这边不再赘述。

3. YOLOv8-OBB后处理

之前有提到过 YOLOv8-OBB 后处理部分和 YOLOv5 基本相似，但由于 YOLOv8-OBB 多了角度信息，因此对于 decode 解码部分我们需要进行简单调整，此外 IoU 的计算也需要调整为 ProbIoU，代码可参考：yolo.cu#L129

因此我们不难写出 YOLOv8-OBB 的 decode 解码部分的实现代码，如下所示：

static __global__ void decode_kernel(float* predict, int num_bboxes, int num_classes, float confidence_threshold, float* invert_affine_matrix, float* parray, int max_objects){  
    // cx, cy, w, h, cls, angle
    int position = blockDim.x * blockIdx.x + threadIdx.x;
    if (position >= num_bboxes) return;

    float* pitem            = predict + (5 + num_classes) * position;
    float* class_confidence = pitem + 4;
    float confidence        = *class_confidence++;
    int label               = 0;
    for(int i = 1; i < num_classes; ++i, ++class_confidence){
        if(*class_confidence > confidence){
            confidence = *class_confidence;
            label      = i;
        }
    }

    if(confidence < confidence_threshold)
        return;

    int index = atomicAdd(parray, 1);
    if(index >= max_objects)
        return;

    float cx         = *pitem++;
    float cy         = *pitem++;
    float width      = *pitem++;
    float height     = *pitem++;
    float angle      = *(pitem + num_classes);
    affine_project(invert_affine_matrix, cx, cy, width, height, &cx, &cy, &width, &height);

    float* pout_item = parray + 1 + index * NUM_BOX_ELEMENT;
    *pout_item++ = cx;
    *pout_item++ = cy;
    *pout_item++ = width;
    *pout_item++ = height;
    *pout_item++ = angle;
    *pout_item++ = confidence;
    *pout_item++ = label;
    *pout_item++ = 1; // 1 = keep, 0 = ignore
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

关于 decode 的具体实现其实就是启动多个线程，每个线程处理一个框的解码，我们会通过仿射变换逆矩阵 IM 将坐标映射回原图上，值得注意的是角度维度在最后一维，类别信息在中间，即一个旋转框 20 维的信息为 [cx, cy, w, h, cls*15, angle]。

另外关于 NMS 部分，由于在 YOLOv8-OBB 模型中采用的是 ProbIoU 计算两个旋转框相似度，因此也需要适当调整，调整后的 NMS 代码如下：

static __device__ void convariance_matrix(float w, float h, float r, float& a, float& b, float& c){
    float a_val = w * w / 12.0f;
    float b_val = h * h / 12.0f;
    float cos_r = cosf(r); 
    float sin_r = sinf(r);

    a = a_val * cos_r * cos_r + b_val * sin_r * sin_r;
    b = a_val * sin_r * sin_r + b_val * cos_r * cos_r;
    c = (a_val - b_val) * sin_r * cos_r;
}

static __device__ float box_probiou(
    float cx1, float cy1, float w1, float h1, float r1,
    float cx2, float cy2, float w2, float h2, float r2,
    float eps = 1e-7
){

    // Calculate the prob iou between oriented bounding boxes, https://arxiv.org/pdf/2106.06072v1.pdf.
    float a1, b1, c1, a2, b2, c2;
    convariance_matrix(w1, h1, r1, a1, b1, c1);
    convariance_matrix(w2, h2, r2, a2, b2, c2);

    float t1 = ((a1 + a2) * powf(cy1 - cy2, 2) + (b1 + b2) * powf(cx1 - cx2, 2)) / ((a1 + a2) * (b1 + b2) - powf(c1 + c2, 2) + eps);
    float t2 = ((c1 + c2) * (cx2 - cx1) * (cy1 - cy2)) / ((a1 + a2) * (b1 + b2) - powf(c1 + c2, 2) + eps);
    float t3 = logf(((a1 + a2) * (b1 + b2) - powf(c1 + c2, 2)) / (4 * sqrtf(fmaxf(a1 * b1 - c1 * c1, 0.0f)) * sqrtf(fmaxf(a2 * b2 - c2 * c2, 0.0f)) + eps) + eps); 
    float bd = 0.25f * t1 + 0.5f * t2 + 0.5f * t3;
    bd = fmaxf(fminf(bd, 100.0f), eps);
    float hd = sqrtf(1.0f - expf(-bd) + eps);
    return 1 - hd;    
}

static __global__ void nms_kernel(float* bboxes, int max_objects, float threshold){

    int position = (blockDim.x * blockIdx.x + threadIdx.x);
    int count = min((int)*bboxes, max_objects);
    if (position >= count) 
        return;
    
    // cx, cy, w, h, angle, confidence, class_label, keepflag
    float* pcurrent = bboxes + 1 + position * NUM_BOX_ELEMENT;
    for(int i = 0; i < count; ++i){
        float* pitem = bboxes + 1 + i * NUM_BOX_ELEMENT;
        if(i == position || pcurrent[6] != pitem[6]) continue;

        if(pitem[5] >= pcurrent[5]){
            if(pitem[5] == pcurrent[5] && i < position)
                continue;

            float iou = box_probiou(
                pcurrent[0], pcurrent[1], pcurrent[2], pcurrent[3], pcurrent[4],
                pitem[0],    pitem[1],    pitem[2],    pitem[3],    pitem[4]
            );

            if(iou > threshold){
                pcurrent[7] = 0;  // 1=keep, 0=ignore
                return;
            }
        }
    }
} 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

关于 NMS 的具体实现也是启动多个线程，每个线程处理一个框，如果剩余框中的置信度大于当前线程中处理的框，则计算两个框的 ProbIoU，通过 ProbIoU 值判断是否保留该框。相比于 CPU 版的 NMS 应该是少套了一层循环，另外一层循环是通过 CUDA 上线程的并行操作处理的，代码参考自：yolo_decode.cu#L81

4. YOLOv8-OBB推理

通过上面对 YOLOv8-OBB 的预处理和后处理分析之后，整个推理过程就显而易见了。C++ 上 YOLOv8-OBB 的预处理部分可直接沿用 YOLOv5 的预处理，后处理中的 decode 解码部分和 NMS 部分需要简单修改。

我们在终端执行如下指令即可完成推理（注意！完整流程博主会在后续内容介绍，这边只是简单演示）

make yolo_obb
1

编译图解如下所示：

在这里插入图片描述

推理结果如下图所示：

在这里插入图片描述

至此，我们在 C++ 上面完成了 YOLOv8-OBB 的整个推理过程，下面我们将完整的走一遍流程。

三、YOLOv8-OBB部署

博主新建了一个仓库 tensorRT_Pro-YOLOv8，该仓库基于 shouxieai/tensorRT_Pro，并进行了调整以支持 YOLOv8 的各项任务，目前已支持分类、检测、分割、姿态点估计、旋转目标检测任务。

下面我们就来具体看看如何利用 tensorRT_Pro-YOLOv8 这个 repo 完成 YOLOv8-OBB 的推理。

1. 源码下载

tensorRT_Pro-YOLOv8 的代码可以直接从 GitHub 官网上下载，源码下载地址是 https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8，Linux 下代码克隆指令如下：

git clone https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8
1

也可手动点击下载，点击右上角的 Code 按键，将代码下载下来。至此整个项目就已经准备好了。也可以点击 here【pwd:yolo】下载博主准备好的源代码（注意代码下载于 2024/1/21 日，若有改动请参考最新）

2. 环境配置

需要使用的软件环境有 TensorRT、CUDA、cuDNN、OpenCV、Protobuf，所有软件环境的安装可以参考 Ubuntu20.04软件安装大全，这里不再赘述，需要各位看官自行配置好相关环境
声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/凡人多烦事01/article/detail/257206