梳理下 YOLOv8-Cls 的预处理流程,顺便让 tensorRT_Pro 支持 YOLOv8-Cls
在 YOLOv8 主目录下新建 predict-cls.py 预测文件,其内容如下:
import cv2 from ultralytics import YOLO if __name__ == "__main__": model = YOLO("yolov8s-cls.pt") img = cv2.imread("ultralytics/assets/bus.jpg") result = model(img)[0] names = result.names top1_label = result.probs.top1 top5_label = result.probs.top5 top1_conf = result.probs.top1conf top5_conf = result.probs.top5conf top1_name = names[top1_label] print(f"The model predicted category is {top1_name}, label = {top1_label}, confidence = {top1_conf:.4f}")
在上述代码中我们通过 opencv 读取了一张图像,并送入模型中推理得到 results,results 中保存着不同任务的结果,我们这里是分类任务,因此只需要拿到对应 1000 个类别中最高置信度的类别标签即可。
模型预测成功后我们就需要自己动手来写下 YOLOv8-Cls 的预处理,方便后续在 C++ 上的实现
经过我们的调试分析可知 YOLOv8-Cls 的预处理过程在 ultralytics/data/augment.py 文件中,可以参考:augment.py#L1059
class CenterCrop: """YOLOv8 CenterCrop class for image preprocessing, designed to be part of a transformation pipeline, e.g., T.Compose([CenterCrop(size), ToTensor()]). """ def __init__(self, size=640): """Converts an image from numpy array to PyTorch tensor.""" super().__init__() self.h, self.w = (size, size) if isinstance(size, int) else size def __call__(self, im): """ Resizes and crops the center of the image using a letterbox method. Args: im (numpy.ndarray): The input image as a numpy array of shape HWC. Returns: (numpy.ndarray): The center-cropped and resized image as a numpy array. """ imh, imw = im.shape[:2] m = min(imh, imw) # min dimension top, left = (imh - m) // 2, (imw - m) // 2 return cv2.resize(im[top:top + m, left:left + m], (self.w, self.h), interpolation=cv2.INTER_LINEAR) class ToTensor: """YOLOv8 ToTensor class for image preprocessing, i.e., T.Compose([LetterBox(size), ToTensor()]).""" def __init__(self, half=False): """Initialize YOLOv8 ToTensor object with optional half-precision support.""" super().__init__() self.half = half def __call__(self, im): """ Transforms an image from a numpy array to a PyTorch tensor, applying optional half-precision and normalization. Args: im (numpy.ndarray): Input image as a numpy array with shape (H, W, C) in BGR order. Returns: (torch.Tensor): The transformed image as a PyTorch tensor in float32 or float16, normalized to [0, 1]. """ im = np.ascontiguousarray(im.transpose((2, 0, 1))[::-1]) # HWC to CHW -> BGR to RGB -> contiguous im = torch.from_numpy(im) # to torch im = im.half() if self.half else im.float() # uint8 to fp16/32 im /= 255.0 # 0-255 to 0.0-1.0 return im
def preprocess(img, dst_width=224, dst_height=224):
imh, imw = img.shape[:2]
m = min(imh, imw)
top, left = (imh - m) // 2, (imw - m) // 2
img_pre = img[top:top+m, left:left+m]
img_pre = cv2.resize(img_pre, (dst_width, dst_height), interpolation=cv2.INTER_LINEAR)
img_pre = (img_pre[...,::-1] / 255.0).astype(np.float32)
img_pre = img_pre.transpose(2, 0, 1)[None]
img_pre = torch.from_numpy(img_pre)
return img_pre
经过中心裁剪并 resize 后的图片如下所示:
由于我们经过 softmax 后直接得到的是每个类别的概率值,因此没有后处理一说,YOLOv8-Cls 的推理包括图像预处理、模型推理,其中预处理主要是 中心裁剪和缩放。
import cv2 import torch import numpy as np from ultralytics.nn.autobackend import AutoBackend def preprocess(img, dst_width=224, dst_height=224): imh, imw = img.shape[:2] m = min(imh, imw) top, left = (imh - m) // 2, (imw - m) // 2 img_pre = img[top:top+m, left:left+m] img_pre = cv2.resize(img_pre, (dst_width, dst_height), interpolation=cv2.INTER_LINEAR) img_pre = (img_pre[...,::-1] / 255.0).astype(np.float32) img_pre = img_pre.transpose(2, 0, 1)[None] img_pre = torch.from_numpy(img_pre) return img_pre if __name__ == "__main__": img = cv2.imread("ultralytics/assets/bus.jpg") img_pre = preprocess(img) model = AutoBackend(weights="yolov8s-cls.pt") names = model.names probs = model(img_pre)[0] top1_label = int(probs.argmax()) top5_label = (-probs).argsort(0)[:5].tolist() top1_conf = probs[top1_label] top5_conf = probs[top5_label] top1name = names[top1_label] print(f"The model predicted category is {top1name}, label = {top1_label}, confidence = {top1_conf:.4f}")
至此,我们在 Python 上面完成了 YOLOv8-Cls 的整个推理过程,下面我们去 C++ 上实现。
C++ 上的实现我们使用的 repo 依旧是 tensorRT_Pro,现在我们就基于 tensorRT_Pro 完成 YOLOv8-Cls 在 C++ 上的推理。
首先我们需要将 YOLOv8-Cls 模型导出为 ONNX,为了适配 tensorRT_Pro 我们需要做一些修改,主要有以下几点:
1. 在 ultralytics/engine/exporter.py 文件中改动一处
# ========== exporter.py ========== # ultralytics/engine/exporter.py第323行 # output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0'] # dynamic = self.args.dynamic # if dynamic: # dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640) # if isinstance(self.model, SegmentationModel): # dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400) # dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160) # elif isinstance(self.model, DetectionModel): # dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400) # 修改为: output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output'] dynamic = self.args.dynamic if dynamic: dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640) dynamic['output'] = {0: 'batch'} if isinstance(self.model, SegmentationModel): dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400) dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160) elif isinstance(self.model, DetectionModel): dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)
以上就是为了适配 tensorRT_Pro 而做出的代码修改,修改好以后,将预训练权重 yolov8-cls.pt 放在 ultralytics-main 主目录下,新建导出文件 export.py,内容如下:
from ultralytics import YOLO
model = YOLO("yolov8s-cls.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)
在终端执行如下指令即可完成 onnx 导出:
python export.py
可以看到导出的 pytorch 模型的输入 shape 是 1x3x224x224,输出 shape 是 1x1000,符合我们的预期。
导出成功后会在当前目录下生成 yolov8s-cls.onnx 模型,我们可以使用 Netron 可视化工具查看,如下图所示:
可以看到输入节点名是 images,维度是 batchx3x224x224,保证只有 batch 维度动态,输出节点名是 output,维度是 batchx1000,保证只有 batch 维度动态,符合 tensorRT_Pro 的格式。
之前有提到过 YOLOv8-Cls 的预处理部分主要是中心裁剪加缩放,而在 tensorRT_Pro 中有提供 resize 的实现,我们只需要添加中心裁剪即可。
因此我们不难写出 YOLOv8-Cls 的预处理代码,如下所示:
__global__ void crop_resize_bilinear_and_normalize_kernel( uint8_t* src, int src_line_size, int src_width, int src_height, float* dst, int dst_width, int dst_height, int crop_x, int crop_y, float sx, float sy, Norm norm, int edge ){ int position = blockDim.x * blockIdx.x + threadIdx.x; if (position >= edge) return; int dx = position % dst_width; int dy = position / dst_width; float src_x = (dx + 0.5f) * sx - 0.5f + crop_x; float src_y = (dy + 0.5f) * sy - 0.5f + crop_y; float c0, c1, c2; int y_low = floorf(src_y); int x_low = floorf(src_x); int y_high = limit(y_low + 1, 0, src_height - 1); int x_high = limit(x_low + 1, 0, src_width - 1); y_low = limit(y_low, 0, src_height - 1); x_low = limit(x_low, 0, src_width - 1); int ly = rint((src_y - y_low) * INTER_RESIZE_COEF_SCALE); int lx = rint((src_x - x_low) * INTER_RESIZE_COEF_SCALE); int hy = INTER_RESIZE_COEF_SCALE - ly; int hx = INTER_RESIZE_COEF_SCALE - lx; int w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx; float* pdst = dst + dy * dst_width + dx * 3; uint8_t* v1 = src + y_low * src_line_size + x_low * 3; uint8_t* v2 = src + y_low * src_line_size + x_high * 3; uint8_t* v3 = src + y_high * src_line_size + x_low * 3; uint8_t* v4 = src + y_high * src_line_size + x_high * 3; c0 = resize_cast(w1 * v1[0] + w2 * v2[0] + w3 * v3[0] + w4 * v4[0]); c1 = resize_cast(w1 * v1[1] + w2 * v2[1] + w3 * v3[1] + w4 * v4[1]); c2 = resize_cast(w1 * v1[2] + w2 * v2[2] + w3 * v3[2] + w4 * v4[2]); if(norm.channel_type == ChannelType::Invert){ float t = c2; c2 = c0; c0 = t; } if(norm.type == NormType::MeanStd){ c0 = (c0 * norm.alpha - norm.mean[0]) / norm.std[0]; c1 = (c1 * norm.alpha - norm.mean[1]) / norm.std[1]; c2 = (c2 * norm.alpha - norm.mean[2]) / norm.std[2]; }else if(norm.type == NormType::AlphaBeta){ c0 = c0 * norm.alpha + norm.beta; c1 = c1 * norm.alpha + norm.beta; c2 = c2 * norm.alpha + norm.beta; } int area = dst_width * dst_height; float* pdst_c0 = dst + dy * dst_width + dx; float* pdst_c1 = pdst_c0 + area; float* pdst_c2 = pdst_c1 + area; *pdst_c0 = c0; *pdst_c1 = c1; *pdst_c2 = c2; }
相比于 resize 的实现就多了一个偏移,主要是为了做中心裁剪,具体代码可以参考:preprocess_kernel.cu#L49
通过上面对 YOLOv8-Cls 的预处理分析之后,整个推理过程就显而易见了。C++ 上 YOLOv8-Cls 的预处理部分将 resize 简单修改即可。
make yolo_cls
至此,我们在 C++ 上面完成了 YOLOv8-Cls 的整个推理过程,下面我们将完整的走一遍流程。
博主新建了一个仓库 tensorRT_Pro-YOLOv8,该仓库基于 shouxieai/tensorRT_Pro,并进行了调整以支持 YOLOv8 的各项任务,目前已支持分类、检测、分割、姿态点估计任务。
下面我们就来具体看看如何利用 tensorRT_Pro-YOLOv8 这个 repo 完成 YOLOv8-Cls 的推理。
tensorRT_Pro-YOLOv8 的代码可以直接从 GitHub 官网上下载,源码下载地址是 https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8,Linux 下代码克隆指令如下:
git clone https://github.com/Melody-Zhou/tensorRT_Pro-YOLOv8.git
也可手动点击下载,点击右上角的 Code
按键,将代码下载下来。至此整个项目就已经准备好了。也可以点击 here 下载博主准备好的源代码(注意代码下载于 2023/11/7 日,若有改动请参考最新)
需要使用的软件环境有 TensorRT、CUDA、cuDNN、OpenCV、Protobuf,所有软件环境的安装可以参考 Ubuntu20.04软件安装大全,这里不再赘述,需要各位看官自行配置好相关环境
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。