当前位置:   article > 正文

jetson nano 部署yolov5目标检测 并且 tensorRT加速(保姆级教程)_jetson部署yolov5

jetson部署yolov5

一 、准备工具

1.读卡器

2.SD 卡

3.网线

4.键盘

5.鼠标

二、烧录

定义:可以理解为把电脑格式化,因为我们用的jetson nano 之前有人用过,我们需要把SD清空

插上读卡器时,会弹出很多盘,我们只需要把其中一个进行格式化就行了

第一步:

需要下载一个烧录用的软件,链接已给出!

JetPack SDK 4.4.1 archive | NVIDIA Developer

 

下载完成之后需要进行解压

第二步:

我们还需要下载烧录SD卡的工具,链接在下边!Get Started With Jetson Nano Developer Kit | NVIDIA Developer 

下载完之后 是一个包,点击安装即可

 

安装完之后 就可以进行烧录拉!!!

 

 

然后开始烧录,会自动烧录两边

烧录完成之后便可以插卡开机

 三、搭建环境

1.配置cuda

打开终端,输入命令:

sudo gedit ~/.bashrc

会弹出一个文档,拉到文档最下面,加上以下命令,保存并退出:

  1. export CUDA_HOME=/usr/local/cuda-10.2
  2. export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
  3. export PATH=/usr/local/cuda-10.2/bin:$PATH

我们可以进行验证是否配置成功:

nvcc -V

2.配置conda 环境

jetson nanoB01的架构是aarch64,与windows和liunx不同不同,所以不能安装Anaconda,可以安装一个替代它的archiconda。

输入命令:

wget https://github.com/Archiconda/build-tools/releases/download/0.2.3/Archiconda3-0.2.3-Linux-aarch64.sh

下载好之后输入一下命令:

bash Archiconda3-0.2.3-Linux-aarch64.sh

下载完之后可以进行查看是否成功:

conda -V

安装完成之后就可以配置环境了:

sudo gedit ~/.bashrc

会弹出一个文档,跟上边一样的操作,在底部加上一行命令,保存并退出:

export PATH=~/archiconda3/bin:$PATH

3.创建自己的虚拟环境

  1. conda create -n xxx(虚拟环境名) python=3.6 #创建一个python3.6的虚拟环境
  2. conda activate xxx 进入虚拟环境
  3. conda deactivate 退出虚拟环境

有的用这个命令会进不去虚拟环境,可以用下边这个命令就可以进去拉:

source activate xx

4.换源

首先需要备份一下sources.list文件,执行后终端没有反应:

sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak 

进入文件:

sudo gedit /etc/apt/sources.list

把里边内容全部删除,并换成一下内容,保存并退出:

  1. deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main multiverse restricted universe
  2. deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main multiverse restricted universe
  3. deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main multiverse restricted universe
  4. deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main multiverse restricted universe
  5. deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main multiverse restricted universe
  6. deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main multiverse restricted universe
  7. deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main multiverse restricted universe
  8. deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main multiverse restricted universe

然后更新软件,保存到本地:

sudo apt-get update

更新软件:

sudo apt-get upgrade

这可能会报错:

例如:

 解决方案:

 

即可完成!!!

升级所有的安装包,并解决依赖关系:

sudo apt-get dist-upgrade

 

5. 安装pip

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev

更新pip到最新版本:

pip3 install --upgrade pip		#如果pip已是最新,可不执行

6. 下载torch和torchvision

我们需要到英伟达官网下载相对于的版本,我这里下载的1.8.0

链接:PyTorch for Jetson - version 1.10 now available - Jetson Nano - NVIDIA Developer Forums

因为我们在jetson nano 上下载torch和torchvision会出现太卡 进不去的情况,所有我们这里推荐一个工具(MobaXterm)用于连接板子进行互传,当然你也可以直接下载到U盘,用U盘实现互传。

我们可以看这位大佬教我们安装并使用(MobaXterm):

MobaXterm(终端工具)下载&安装&使用教程_蜗牛也不慢......的博客-CSDN博客 

7、安装torch

首先我们打开终端,进入自己创建的虚拟环境,输入命令:

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev 
pip install torch-1.8.0-cp36-cp36m-linux_aarch64.whl

这时需要安装numpy,不然测试torch时不显示安装成功

sudo apt install python3-numpy

测试torch,是否安装成功 首先输入命令进入python

  1. import torch
  2. print(torch.__version__)

torch,在测试时报错:非法指令,核心已转储解决办法:

export OPENBLAS_CORETYPE=ARMV8

8、安装torchvision,逐个执行以下命令(如果执行第三个命令报错,不用管,继续执行第四第五行命令,如果不报错就直接cd ..)

  1. cd torchvision
  2. export BUILD_VERSION=0.9.0
  3. sudo python setup.py install
  4. python setup.py build
  5. python setup.py install
  6. cd ..

测试torchvision 跟上边测试一样:

  1. python
  2. import torchvision
  3. print(torchvision.__version__)

在测试时如果报错PIL,就安装pillow,命令如下(在第二步如果报权限错误,就在开头加sudo,或在结尾加--user)

  1. sudo apt-get install libjpeg8 libjpeg62-dev libfreetype6 libfreetype6-dev
  2. python3 -m pip install -i https://mirrors.aliyun.com/pypi/simple pillow

四、测试YOLOv5

1、去官网上下载需要的版本,我这里下载的是5.0版本,下载的时候要把对应的权重也要下载下来网址如下:

ultralytics/yolov5 at v5.0 (github.com)

权重网址如下:

Releases · ultralytics/yolov5 (github.com) 

 YOLOv5 和 权重 必须对应 否则会报错 我当时就踩坑拉

 然后通过MobaXterm将YoloV5和模型拉到home目录下,从YOLOv5文件打开终端执行以下命令:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

这里可能会报错:

解决方法:

这是因为numpy版本过高,减低一下就可以拉,把版本降低到1.19.4,输入以下命令:

pip install numpy==1.19.4 -i https://pypi.tuna.tsinghua.edu.cn/simple

还有一种报错:

这只是因为网络问题,重新下载即可 

就可以继续安装拉 ,当安装opencv时会出现Building wheel for opencv-python (pyroject.toml),就说明快要成功拉,但是会很慢,不要着急!!

安装好所有依赖包后,将权重文件拖到yolov5文件夹根目录下,在yolov5的根目录下打开终端,执行以下命令:

python3 detect.py --weights yolov5s.pt 

会出现这个错误:

解决方法,在在common.py中再添加一段代码即可:

  1. import warnings
  2. class SPPF(nn.Module):
  3. # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
  4. def __init__(self, c1, c2, k=5): # equivalent to SPP(k=(5, 9, 13))
  5. super().__init__()
  6. c_ = c1 // 2 # hidden channels
  7. self.cv1 = Conv(c1, c_, 1, 1)
  8. self.cv2 = Conv(c_ * 4, c2, 1, 1)
  9. self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
  10. def forward(self, x):
  11. x = self.cv1(x)
  12. with warnings.catch_warnings():
  13. warnings.simplefilter('ignore') # suppress torch 1.9.0 max_pool2d() warning
  14. y1 = self.m(x)
  15. y2 = self.m(y1)
  16. return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))

五、tensorRT部署yolov5

1、tensorrt官网下载v5,网址如下,确定下载是5.0版本

链接:https://gitcode.net/mirrors/wang-xinyu/tensorrtx?utm_source=csdn_github_accelerator

2.然后我们通过MobaXterm传到jetson nano上,在yolov5里找到gen_wts.py,并且复制刚刚运行过的YOLOv5文件下,右击打开终端,输入下边命令生成wts文件:

python3 gen_wts.py --w yolov5s.pt

 3. 在yolov5中找到yololayer.h文件,打开之后修改一些参数(但是基本上不用修改)

4. 在当前目录下创建文件XX,命令如下:

  1. mkdir new #创建XX文件夹
  2. cd new #进入XX
  3. cmake .. #构建项目
  4. #将我们上面生成的.wts文件复制到 new 文件夹中
  5. make

5. 将上面生成的yolov5.wts文件拖到tensortx/yolov5下,右键打开终端:

sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

6. 在tensorrtx-yolov5-v5.0\yolov5下新建IMG文件夹,在里面放一张需要测试的图片进行测试,命令如下 :

sudo ./yolov5 -d yolov5s.engine ../sample

测试完之后并没有反应,调完摄像头之后会出现效果!!!

7.调用摄像头

修改yolov5.cpp文件,将里面内容替换成如下代码:

  1. #include <iostream>
  2. #include <chrono>
  3. #include "cuda_utils.h"
  4. #include "logging.h"
  5. #include "common.hpp"
  6. #include "utils.h"
  7. #include "calibrator.h"
  8. #define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
  9. #define DEVICE 0 // GPU id
  10. #define NMS_THRESH 0.4
  11. #define CONF_THRESH 0.5
  12. #define BATCH_SIZE 1
  13. // stuff we know about the network and the input/output blobs
  14. static const int INPUT_H = Yolo::INPUT_H;
  15. static const int INPUT_W = Yolo::INPUT_W;
  16. static const int CLASS_NUM = Yolo::CLASS_NUM;
  17. static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
  18. const char* INPUT_BLOB_NAME = "data";
  19. const char* OUTPUT_BLOB_NAME = "prob";
  20. static Logger gLogger;
  21. //修改为自己的类别
  22. char *my_classes[]={ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
  23. "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
  24. "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
  25. "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard","surfboard",
  26. "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
  27. "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
  28. "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
  29. "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
  30. "hair drier", "toothbrush" };
  31. static int get_width(int x, float gw, int divisor = 8) {
  32. //return math.ceil(x / divisor) * divisor
  33. if (int(x * gw) % divisor == 0) {
  34. return int(x * gw);
  35. }
  36. return (int(x * gw / divisor) + 1) * divisor;
  37. }
  38. static int get_depth(int x, float gd) {
  39. if (x == 1) {
  40. return 1;
  41. }
  42. else {
  43. return round(x * gd) > 1 ? round(x * gd) : 1;
  44. }
  45. }
  46. //#创建engine和network
  47. ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
  48. INetworkDefinition* network = builder->createNetworkV2(0U);
  49. // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
  50. ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
  51. assert(data);
  52. std::map<std::string, Weights> weightMap = loadWeights(wts_name);
  53. /* ------ yolov5 backbone------ */
  54. auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
  55. auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
  56. auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
  57. auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
  58. auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
  59. auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
  60. auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
  61. auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
  62. auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8");
  63. /* ------ yolov5 head ------ */
  64. auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9");
  65. auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10");
  66. auto upsample11 = network->addResize(*conv10->getOutput(0));
  67. assert(upsample11);
  68. upsample11->setResizeMode(ResizeMode::kNEAREST);
  69. upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());
  70. ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) };
  71. auto cat12 = network->addConcatenation(inputTensors12, 2);
  72. auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13");
  73. auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14");
  74. auto upsample15 = network->addResize(*conv14->getOutput(0));
  75. assert(upsample15);
  76. upsample15->setResizeMode(ResizeMode::kNEAREST);
  77. upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());
  78. ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) };
  79. auto cat16 = network->addConcatenation(inputTensors16, 2);
  80. auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17");
  81. // yolo layer 0
  82. IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
  83. auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18");
  84. ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) };
  85. auto cat19 = network->addConcatenation(inputTensors19, 2);
  86. auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20");
  87. //yolo layer 1
  88. IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);
  89. auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21");
  90. ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) };
  91. auto cat22 = network->addConcatenation(inputTensors22, 2);
  92. auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
  93. IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);
  94. auto yolo = addYoLoLayer(network, weightMap, "model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});
  95. yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
  96. network->markOutput(*yolo->getOutput(0));
  97. // Build engine
  98. builder->setMaxBatchSize(maxBatchSize);
  99. config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
  100. #if defined(USE_FP16)
  101. config->setFlag(BuilderFlag::kFP16);
  102. #elif defined(USE_INT8)
  103. std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
  104. assert(builder->platformHasFastInt8());
  105. config->setFlag(BuilderFlag::kINT8);
  106. Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
  107. config->setInt8Calibrator(calibrator);
  108. #endif
  109. std::cout << "Building engine, please wait for a while..." << std::endl;
  110. ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
  111. std::cout << "Build engine successfully!" << std::endl;
  112. // Don't need the network any more
  113. network->destroy();
  114. // Release host memory
  115. for (auto& mem : weightMap)
  116. {
  117. free((void*)(mem.second.values));
  118. }
  119. return engine;
  120. }
  121. ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
  122. INetworkDefinition* network = builder->createNetworkV2(0U);
  123. // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
  124. ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
  125. assert(data);
  126. std::map<std::string, Weights> weightMap = loadWeights(wts_name);
  127. /* ------ yolov5 backbone------ */
  128. auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
  129. auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
  130. auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
  131. auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
  132. auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
  133. auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
  134. auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
  135. auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, "model.7");
  136. auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, "model.8");
  137. auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.9");
  138. auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, "model.10");
  139. auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.11");
  140. /* ------ yolov5 head ------ */
  141. auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, "model.12");
  142. auto upsample13 = network->addResize(*conv12->getOutput(0));
  143. assert(upsample13);
  144. upsample13->setResizeMode(ResizeMode::kNEAREST);
  145. upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());
  146. ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) };
  147. auto cat14 = network->addConcatenation(inputTensors14, 2);
  148. auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.15");
  149. auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, "model.16");
  150. auto upsample17 = network->addResize(*conv16->getOutput(0));
  151. assert(upsample17);
  152. upsample17->setResizeMode(ResizeMode::kNEAREST);
  153. upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());
  154. ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) };
  155. auto cat18 = network->addConcatenation(inputTensors18, 2);
  156. auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.19");
  157. auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, "model.20");
  158. auto upsample21 = network->addResize(*conv20->getOutput(0));
  159. assert(upsample21);
  160. upsample21->setResizeMode(ResizeMode::kNEAREST);
  161. upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());
  162. ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) };
  163. auto cat22 = network->addConcatenation(inputTensors21, 2);
  164. auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
  165. auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, "model.24");
  166. ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) };
  167. auto cat25 = network->addConcatenation(inputTensors25, 2);
  168. auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.26");
  169. auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, "model.27");
  170. ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) };
  171. auto cat28 = network->addConcatenation(inputTensors28, 2);
  172. auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.29");
  173. auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, "model.30");
  174. ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) };
  175. auto cat31 = network->addConcatenation(inputTensors31, 2);
  176. auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.32");
  177. /* ------ detect ------ */
  178. IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);
  179. IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);
  180. IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);
  181. IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);
  182. auto yolo = addYoLoLayer(network, weightMap, "model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});
  183. yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
  184. network->markOutput(*yolo->getOutput(0));
  185. // Build engine
  186. builder->setMaxBatchSize(maxBatchSize);
  187. config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
  188. #if defined(USE_FP16)
  189. config->setFlag(BuilderFlag::kFP16);
  190. #elif defined(USE_INT8)
  191. std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
  192. assert(builder->platformHasFastInt8());
  193. config->setFlag(BuilderFlag::kINT8);
  194. Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
  195. config->setInt8Calibrator(calibrator);
  196. #endif
  197. std::cout << "Building engine, please wait for a while..." << std::endl;
  198. ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
  199. std::cout << "Build engine successfully!" << std::endl;
  200. // Don't need the network any more
  201. network->destroy();
  202. // Release host memory
  203. for (auto& mem : weightMap)
  204. {
  205. free((void*)(mem.second.values));
  206. }
  207. return engine;
  208. }
  209. void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) {
  210. // Create builder
  211. IBuilder* builder = createInferBuilder(gLogger);
  212. IBuilderConfig* config = builder->createBuilderConfig();
  213. // Create model to populate the network, then set the outputs and create an engine
  214. ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
  215. assert(engine != nullptr);
  216. // Serialize the engine
  217. (*modelStream) = engine->serialize();
  218. // Close everything down
  219. engine->destroy();
  220. builder->destroy();
  221. config->destroy();
  222. }
  223. void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, float* input, float* output, int batchSize) {
  224. // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
  225. CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
  226. context.enqueue(batchSize, buffers, stream, nullptr);
  227. CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
  228. cudaStreamSynchronize(stream);
  229. }
  230. bool parse_args(int argc, char** argv, std::string& engine) {
  231. if (argc < 3) return false;
  232. if (std::string(argv[1]) == "-v" && argc == 3) {
  233. engine = std::string(argv[2]);
  234. }
  235. else {
  236. return false;
  237. }
  238. return true;
  239. }
  240. int main(int argc, char** argv) {
  241. cudaSetDevice(DEVICE);
  242. //std::string wts_name = "";
  243. std::string engine_name = "";
  244. //float gd = 0.0f, gw = 0.0f;
  245. //std::string img_dir;
  246. if (!parse_args(argc, argv, engine_name)) {
  247. std::cerr << "arguments not right!" << std::endl;
  248. std::cerr << "./yolov5 -v [.engine] // run inference with camera" << std::endl;
  249. return -1;
  250. }
  251. std::ifstream file(engine_name, std::ios::binary);
  252. if (!file.good()) {
  253. std::cerr << " read " << engine_name << " error! " << std::endl;
  254. return -1;
  255. }
  256. char* trtModelStream{ nullptr };
  257. size_t size = 0;
  258. file.seekg(0, file.end);
  259. size = file.tellg();
  260. file.seekg(0, file.beg);
  261. trtModelStream = new char[size];
  262. assert(trtModelStream);
  263. file.read(trtModelStream, size);
  264. file.close();
  265. // prepare input data ---------------------------
  266. static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
  267. //for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
  268. // data[i] = 1.0;
  269. static float prob[BATCH_SIZE * OUTPUT_SIZE];
  270. IRuntime* runtime = createInferRuntime(gLogger);
  271. assert(runtime != nullptr);
  272. ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
  273. assert(engine != nullptr);
  274. IExecutionContext* context = engine->createExecutionContext();
  275. assert(context != nullptr);
  276. delete[] trtModelStream;
  277. assert(engine->getNbBindings() == 2);
  278. void* buffers[2];
  279. // In order to bind the buffers, we need to know the names of the input and output tensors.
  280. // Note that indices are guaranteed to be less than IEngine::getNbBindings()
  281. const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
  282. const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
  283. assert(inputIndex == 0);
  284. assert(outputIndex == 1);
  285. // Create GPU buffers on device
  286. CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
  287. CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
  288. // Create stream
  289. cudaStream_t stream;
  290. CUDA_CHECK(cudaStreamCreate(&stream));
  291. //#读取本地视频
  292. //cv::VideoCapture capture("/home/nano/Videos/video.mp4");
  293. //#调用本地usb摄像头,我的默认参数为1,如果1报错,可修改为0.
  294. cv::VideoCapture capture(1);
  295. if (!capture.isOpened()) {
  296. std::cout << "Error opening video stream or file" << std::endl;
  297. return -1;
  298. }
  299. int key;
  300. int fcount = 0;
  301. while (1)
  302. {
  303. cv::Mat frame;
  304. capture >> frame;
  305. if (frame.empty())
  306. {
  307. std::cout << "Fail to read image from camera!" << std::endl;
  308. break;
  309. }
  310. fcount++;
  311. //if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;
  312. for (int b = 0; b < fcount; b++) {
  313. //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
  314. cv::Mat img = frame;
  315. if (img.empty()) continue;
  316. cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
  317. int i = 0;
  318. for (int row = 0; row < INPUT_H; ++row) {
  319. uchar* uc_pixel = pr_img.data + row * pr_img.step;
  320. for (int col = 0; col < INPUT_W; ++col) {
  321. data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;
  322. data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;
  323. data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;
  324. uc_pixel += 3;
  325. ++i;
  326. }
  327. }
  328. }
  329. // Run inference
  330. auto start = std::chrono::system_clock::now();//#获取模型推理开始时间
  331. doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
  332. auto end = std::chrono::system_clock::now();//#结束时间
  333. //std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
  334. int fps = 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
  335. std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
  336. for (int b = 0; b < fcount; b++) {
  337. auto& res = batch_res[b];
  338. nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
  339. }
  340. for (int b = 0; b < fcount; b++) {
  341. auto& res = batch_res[b];
  342. //std::cout << res.size() << std::endl;
  343. //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
  344. for (size_t j = 0; j < res.size(); j++) {
  345. cv::Rect r = get_rect(frame, res[j].bbox);
  346. cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
  347. std::string label = my_classes[(int)res[j].class_id];
  348. cv::putText(frame, label, cv::Point(r.x, r.y - 1), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
  349. std::string jetson_fps = "FPS: " + std::to_string(fps);
  350. cv::putText(frame, jetson_fps, cv::Point(11, 80), cv::FONT_HERSHEY_PLAIN, 3, cv::Scalar(0, 0, 255), 2, cv::LINE_AA);
  351. }
  352. //cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
  353. }
  354. cv::imshow("yolov5", frame);
  355. key = cv::waitKey(1);
  356. if (key == 'q') {
  357. break;
  358. }
  359. fcount = 0;
  360. }
  361. capture.release();
  362. // Release stream and buffers
  363. cudaStreamDestroy(stream);
  364. CUDA_CHECK(cudaFree(buffers[inputIndex]));
  365. CUDA_CHECK(cudaFree(buffers[outputIndex]));
  366. // Destroy the engine
  367. context->destroy();
  368. engine->destroy();
  369. runtime->destroy();
  370. return 0;
  371. }

可能会出现打开方式的不对,导致不能删除里边的内容,注意:用文本的方式打开

修改之后  打开终端输入命令:

  1. cd new
  2. make
  3. sudo ./yolov5 -v yolov5s.engine

调用即可成功:

 完成jetson nano 部署yolov5目标检测 并且 tensorRT加速

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/452443
推荐阅读
相关标签
  

闽ICP备14008679号