当前位置:   article > 正文

pytorch: cpu,cuda,tensorRt 推理对比学习

pytorch: cpu,cuda,tensorRt 推理对比学习

0:先看结果

针对resnet模型对图片做处理

原图结果

分别使用cpu,cuda,TensorRt做推理,所需要的时间对比

方法时间
cpu13s594ms
cuda711ms
tensorRt

113ms

项目地址:

GitHub - july1992/Pytorch-vily-study: vily 学习pytorch,机器学习,推理加速~

模型地址:

cpu+cuda:

Deeplabv3 | PyTorch

tensorRt:  因为需要数onnx模型文件,所以使用nvida官方的resnet onnx

Quick Start Guide :: NVIDIA Deep Learning TensorRT Documentation

wget https://download.onnxruntime.ai/onnx/models/resnet50.tar.gz

 一:学习历程

因为需要gpu,所以在xxxx宝上买一个带gpu的ubuntu服务器,20.x版本之上(gpu :3060 12g)

1.1 查看服务器的gpu版本

nvidia-smi

1.2: 在linux上安装cuda版本的pytorch,  可选历史版本安装

1.3:  当前安装版本:

Python 3.11.5

cuda_11.7

PyTorch 2.3.0

CUDA available with version: 11.8

cuDNN version: 870

tensor: 10.2.0

1.4:  这里使用resnet50 测试

模型地址;Deeplabv3 | PyTorch

1.5 分析代码:

  1.  import torch
  2. model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
  3. model.eval()

这里会将模型下载到/home/wuyou/.cache/torch/hub/  目录下,如果下载失败,可以手动下载,在放入相关位置,要记得改名字

2: cpu和cuda运行对比

2.1 cpu和cuda的代码

  1. import torch
  2. from datetime import datetime
  3. now = datetime.now()
  4. print('0--',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  5. model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
  6. # or any of these variants
  7. # model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet101', pretrained=True)
  8. # model = torch.hub.load('pytorch/vision:v0.10.0', 'd\eeplabv3_mobilenet_v3_large', pretrained=True)
  9. model.eval()
  10. # print('model:',model)
  11. now = datetime.now()
  12. print('1--',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  13. # sample execution (requires torchvision)
  14. from PIL import Image
  15. from torchvision import transforms
  16. input_image = Image.open('img/dog.jpg')
  17. input_image = input_image.convert("RGB")
  18. # 定义图像转换(这应该与训练模型时使用的转换相匹配)
  19. preprocess = transforms.Compose([
  20. transforms.ToTensor(),
  21. transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  22. ])
  23. input_tensor = preprocess(input_image)
  24. # 对图像进行转换
  25. input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
  26. now = datetime.now()
  27. print('2--前',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  28. # move the input and model to GPU for speed if available
  29. if torch.cuda.is_available():
  30. print('走进cuda了')
  31. input_batch = input_batch.to('cuda')
  32. model.to('cuda')
  33. # 使用模型进行预测
  34. with torch.no_grad():
  35. print('走进no_grad了')
  36. output = model(input_batch)['out'][0]
  37. output_predictions = output.argmax(0)
  38. now = datetime.now()
  39. print('2--后',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  40. print(output_predictions[0])
  41. # import numpy as np
  42. # # 使用 np.ndarray
  43. # ## 将预测结果转换为numpy数组
  44. palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
  45. colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
  46. colors = (colors % 255).numpy().astype("uint8")
  47. # # plot the semantic segmentation predictions of 21 classes in each color
  48. r = Image.fromarray(output_predictions.byte().cpu().numpy()).resize(input_image.size)
  49. r.putpalette(colors)
  50. # now = datetime.now()
  51. # print('3--',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  52. r.save('img1.png')
  53. # import matplotlib.pyplot as plt
  54. # plt.imshow(r)
  55. # plt.show()
  56. # input("Press Enter to close...")

2.2  使用cpu的时候,下面这段代码要隐藏

  1. if torch.cuda.is_available():
  2. print('走进cuda了')
  3. input_batch = input_batch.to('cuda')
  4. model.to('cuda')

2.3 分别执行得到结果

cpu

13s594ms

cuda

711ms

19倍

2: 使用tensor

使用tensor RT的理由, 它可以加速模型推理,榨干你的G PU使用率,官方声称可以提高4-6倍速度。

2.1 安装好tensor环境,查看上一篇文章

Tensor安装和测试-CSDN博客

2.2 下载一个onnx的模型,至于为什么要使用onnx,可以去b站看

Quick Start Guide :: NVIDIA Deep Learning TensorRT Documentation

解压后,进入文件夹得到 model.onnx

2.3 将上面model.onnx 转换成引擎

trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine.trt

这里遇到一些bug,放在本文BUG章节描述

2.4  部署模型

参考官方例子

  1. 创建py
  2. import numpy as np
  3. PRECISION = np.float32
  4. from onnx_helper import ONNXClassifierWrapper
  5. BATCH_SIZE=32
  6. N_CLASSES = 1000 # Our ResNet-50 is trained on a 1000 class ImageNet task
  7. trt_model = ONNXClassifierWrapper("resnet_engine.trt", [BATCH_SIZE, N_CLASSES], target_dtype = PRECISION)
  8. dummy_input_batch = np.zeros((BATCH_SIZE, 224, 224, 3), dtype = PRECISION)
  9. predictions = trt_model.predict(dummy_input_batch)
  10. print('结果:',predictions[0])

这里报错找不到onnx_help ,等等一些bug,放在本文bug章节。

 2.5 运行结果:

2.6 修改demo,引入图片,

  1. import numpy as np
  2. import torch
  3. PRECISION = np.float32
  4. from onnx_helper import ONNXClassifierWrapper
  5. from datetime import datetime
  6. BATCH_SIZE=32
  7. N_CLASSES = 1000 # Our ResNet-50 is trained on a 1000 class ImageNet task
  8. # 获取当前时间
  9. now = datetime.now()
  10. # 格式化输出当前时间,包括毫秒
  11. print('1--',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  12. trt_model = ONNXClassifierWrapper("resnet_engine.trt", [BATCH_SIZE, N_CLASSES], target_dtype = PRECISION)
  13. # dummy_input_batch = np.zeros((BATCH_SIZE, 224, 224, 3), dtype = PRECISION)
  14. from PIL import Image
  15. from torchvision import transforms
  16. input_image = Image.open('dog.jpg')
  17. input_image = input_image.convert("RGB")
  18. preprocess = transforms.Compose([
  19. transforms.ToTensor(),
  20. transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  21. ])
  22. input_tensor = preprocess(input_image)
  23. input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
  24. # print(dummy_input_batch[0])
  25. now = datetime.now()
  26. # 格式化输出当前时间,包括毫秒
  27. print('2--前',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  28. dummy_input_batch=input_batch.numpy()
  29. predictions = trt_model.predict(dummy_input_batch)
  30. now = datetime.now()
  31. # 格式化输出当前时间,包括毫秒
  32. print('3--后',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  33. #print('结果:',predictions[0])
  34. output_predictions = predictions
  35. import numpy as np
  36. # plot the semantic segmentation predictions of 21 classes in each color
  37. r = Image.fromarray(output_predictions,'L').resize(input_image.size)
  38. # 获取当前时间
  39. now = datetime.now()
  40. # 格式化输出当前时间,包括毫秒
  41. #print('4--',now.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
  42. r.save('img1.png')

2.7。结果 , 113ms

三 bugs

3.1 执行trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine.trt 报错

TensorTR trtexec:未找到命令

解决:

解决: 在~/.bashrc下添加新环境变量

  1. export LD_LIBRARY_PATH=/vily/TensorRT-10.2.0.19/lib:$LD_LIBRARY_PATH
  2. export PATH=/vily/TensorRT-10.2.0.19/bin:$PATH

3.2 Onnx 已经下载了,还提示 没有onnx-help

or

No matching distribution found for onnx_helper

解决:

找到官方的onyx-help

TensorRT/quickstart/IntroNotebooks/onnx_helper.py at release/10.0 · NVIDIA/TensorRT · GitHub

将文件下载下来,放在当前目录下

3.3。执行报错 找不到v2

解决:

找到代码 将

self.context.execute_async_v2(self.bindings, self.stream.handle, None)

改成

self.context.execute_async_v3( self.stream.handle)

3.4  报错

or

解决onnx_help: Pytorch-vily-study/onxx/onnx_helper.py at base-platform · july1992/Pytorch-vily-study · GitHub

  1. #
  2. # SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3. # SPDX-License-Identifier: Apache-2.0
  4. #
  5. # Licensed under the Apache License, Version 2.0 (the "License");
  6. # you may not use this file except in compliance with the License.
  7. # You may obtain a copy of the License at
  8. #
  9. # http://www.apache.org/licenses/LICENSE-2.0
  10. #
  11. # Unless required by applicable law or agreed to in writing, software
  12. # distributed under the License is distributed on an "AS IS" BASIS,
  13. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. # See the License for the specific language governing permissions and
  15. # limitations under the License.
  16. #
  17. import numpy as np
  18. #import tensorflow as tf
  19. import tensorrt as trt
  20. import pycuda.driver as cuda
  21. import pycuda.autoinit
  22. # For ONNX:
  23. class ONNXClassifierWrapper():
  24. def __init__(self, file, num_classes, target_dtype = np.float32):
  25. self.target_dtype = target_dtype
  26. self.num_classes = num_classes
  27. self.load(file)
  28. self.stream = None
  29. def load(self, file):
  30. f = open(file, "rb")
  31. runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))
  32. # 修改了这里
  33. self.engine = runtime.deserialize_cuda_engine(f.read())
  34. self.context = self.engine.create_execution_context()
  35. def allocate_memory(self, batch):
  36. self.output = np.empty(self.num_classes, dtype = self.target_dtype) # Need to set both input and output precisions to FP16 to fully enable FP16
  37. # Allocate device memory
  38. self.d_input = cuda.mem_alloc(1 * batch.nbytes)
  39. self.d_output = cuda.mem_alloc(1 * self.output.nbytes)
  40. self.bindings = [int(self.d_input), int(self.d_output)]
  41. self.stream = cuda.Stream()
  42. def predict(self, batch): # result gets copied into output
  43. if self.stream is None:
  44. self.allocate_memory(batch)
  45. print('1--')
  46. # Transfer input data to device
  47. cuda.memcpy_htod_async(self.d_input, batch, self.stream)
  48. # Execute model
  49. print('2--')
  50. # 这里修改了
  51. self.context.set_tensor_address(self.engine.get_tensor_name(0), int(self.d_input))
  52. self.context.set_tensor_address(self.engine.get_tensor_name(1), int(self.d_output))
  53. # 这里也修改了
  54. self.context.execute_async_v3(self.stream.handle)
  55. # Transfer predictions back
  56. print('3--')
  57. cuda.memcpy_dtoh_async(self.output, self.d_output, self.stream)
  58. # Syncronize threads
  59. print('4--')
  60. self.stream.synchronize()
  61. return self.output
  62. def convert_onnx_to_engine(onnx_filename, engine_filename = None, max_batch_size = 32, max_workspace_size = 1 << 30, fp16_mode = True):
  63. logger = trt.Logger(trt.Logger.WARNING)
  64. with trt.Builder(logger) as builder, builder.create_network() as network, trt.OnnxParser(network, logger) as parser:
  65. builder.max_workspace_size = max_workspace_size
  66. builder.fp16_mode = fp16_mode
  67. builder.max_batch_size = max_batch_size
  68. print("Parsing ONNX file.")
  69. with open(onnx_filename, 'rb') as model:
  70. if not parser.parse(model.read()):
  71. for error in range(parser.num_errors):
  72. print(parser.get_error(error))
  73. print("Building TensorRT engine. This may take a few minutes.")
  74. engine = builder.build_cuda_engine(network)
  75. if engine_filename:
  76. with open(engine_filename, 'wb') as f:
  77. f.write(engine.serialize())
  78. return engine, logger

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/在线问答5/article/detail/937017
推荐阅读
相关标签
  

闽ICP备14008679号