算法编织者

这个屌丝很懒，什么也没留下！

热门标签

article

docker编译TensorRT源码_autotuning format combination

作者：算法编织者 | 2024-02-03 18:35:01

踩

autotuning format combination

docker编译TensorRT源码

1. 根据cuda，cudnn版本选择合适的源码

TensorRT源码地址：https://github.com/NVIDIA/TensorRT/tree/20.03
cuda10.2+cuda7.6.5: TensorRT最高支持的是TensoRT7.0.0
TensorRT7.1以上均需要cudnn8.0以上，找到合适的源码版本

git clone -b master https://github.com/nvidia/TensorRT TensorRT -b release/7.0
cd TensorRT
git submodule update --init --recursive
1
2
3

2. 下载下载TensorRT的核心库（tar包）

https://developer.nvidia.com/nvidia-tensorrt-7x-download 在这里插入图片描述

# 解压
tar -xvzf TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz
# 将解压得到的TensorRT-7.0.0.11，复制到上面下载好的源码TensoRT
# 建议把复制好的代码，一并复制到创建的/opt/TensorRT7
1
2
3
4

3.编译TensorRT源码

# 进入复制到的代码路径，/opt/TensorRT7/TensorRT
export TRT_SOURCE=`pwd`
export TRT_RELEASE=`pwd`/TensorRT-7.0.0.11
export TENSORRT_LIBRARY_INFER=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libnvinfer.so.7
export TENSORRT_LIBRARY_INFER_PLUGIN=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7
export TENSORRT_LIBRARY_MYELIN=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libmyelin.so
1
2
3
4
5
6

在正式编译之前，查看TensorRT/Cmakelist.txt中的CUDA和CUDNN版本是否正确。

执行编译

# gcc，g++版本不能高于8，ubuntu20.04是gcc9， 需要降级
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_RELEASE/lib -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)
1
2
3
4

出现如下报错

-- Using src='https://github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz'
CMake Error at third_party.protobuf-stamp/download-third_party.protobuf.cmake:159 (message):
  Each download failed!

    error: downloading 'https://github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz' failed
         status_code: 1
         status_string: "Unsupported protocol"
         log:
         --- LOG BEGIN ---
         Protocol "https" not supported or disabled in libcurl

  Closing connection -1

1
2
3
4
5
6
7
8
9
10
11
12
13

解决：

（1）下载依赖包：https://link.zhihu.com/?target=https%3A//github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz

（2）将protobuf-cpp-3.0.0.tar.gz复制到项目的**/build/third_party.protobuf/src/**路径下；

（3）修改下载文件

打开项目路径下的/build/third_party.protobuf/src/third_party.protobuf-stamp/download-third_party.protobuf.cmake

将个**if(EXISTS)**一直到该文件最后一行全部删掉；

在这里插入图片描述

进行完上述三步之后，重新执行命令：make -j$(nproc)，继续开始编译，如果顺畅的话应该能全部编译完成

4. 测试

进入/TensorRT-7.0.0.11/python# ,安装tensorrt的python接口

root@53737c0fc560:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/python# ls
tensorrt-7.0.0.11-cp27-none-linux_x86_64.whl  tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl
tensorrt-7.0.0.11-cp34-none-linux_x86_64.whl  tensorrt-7.0.0.11-cp37-none-linux_x86_64.whl
tensorrt-7.0.0.11-cp35-none-linux_x86_64.whl

python3.6 -m pip install tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl
1
2
3
4
5
6

设置环境变量：

>>> import tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorrt/__init__.py", line 1, in <module>
    from .tensorrt import *
ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory


export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT7/TensorRT/build/

1
2
3
4
5
6
7
8
9
10
11

root@8069c274e477:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/python# python3
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt as trt
>>> print(trt.__version__)
7.0.0.11
>>>

1
2
3
4
5
6
7
8
9

安装其他python下面的whl文件

 python3.7 -m pip install graphsurgeon-0.4.1-py2.py3-none-any.whl
 python3 -m pip install graphsurgeon-0.4.1-py2.py3-none-any.whl
 python3 -m pip install pycuda
1
2
3

如果是docker部署，可以把TensorRT源码和tar包，放到opt路径，重新编译，这样就不依赖docker外的文件

cd /opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/samples
# 编译samples
CUDNN_INSTALL_DIR=/usr/local/cuda/ make -j8
# 生成的可执行文件在/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin


root@50a0f0e0e2a0:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin# ls
chobj                         sample_googlenet        sample_mnist_api_debug      sample_plugin                  sample_uff_mnist
common                        sample_googlenet_debug  sample_mnist_debug          sample_plugin_debug            sample_uff_mnist_debug
dchobj                        sample_int8             sample_movielens            sample_reformat_free_io        sample_uff_plugin_v2_ext
giexec                        sample_int8_api         sample_movielens_debug      sample_reformat_free_io_debug  sample_uff_plugin_v2_ext_debug
sample_char_rnn               sample_int8_api_debug   sample_movielens_mps        sample_ssd                     sample_uff_ssd
sample_char_rnn_debug         sample_int8_debug       sample_movielens_mps_debug  sample_ssd_debug               sample_uff_ssd_debug
sample_dynamic_reshape        sample_mlp              sample_nmt                  sample_uff_faster_rcnn         trtexec
sample_dynamic_reshape_debug  sample_mlp_debug        sample_nmt_debug            sample_uff_faster_rcnn_debug   trtexec_debug
sample_fasterRCNN             sample_mnist            sample_onnx_mnist           sample_uff_mask_rcnn
sample_fasterRCNN_debug       sample_mnist_api        sample_onnx_mnist_debug     sample_uff_mask_rcnn_debug

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

运行测试：

root@50a0f0e0e2a0:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin# ./sample_char_rnn
&&&& RUNNING TensorRT.sample_char_rnn # ./sample_char_rnn
[05/24/2021-17:20:05] [I] Building and running a GPU inference engine for Char RNN model...
[05/24/2021-17:20:05] [I] Done reading weights from file...
[05/24/2021-17:20:05] [I] Done constructing network...
[05/24/2021-17:20:05] [V] [TRT] Applying generic optimizations to the graph for inference.
[05/24/2021-17:20:05] [V] [TRT] Original: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After dead-layer removal: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After Myelin optimization: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After scale fusion: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After vertical fusions: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After final dead-layer removal: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After tensor merging: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After concat removal: 6 layers
[05/24/2021-17:20:05] [V] [TRT] Graph construction and optimization completed in 0.00174117 seconds.
[05/24/2021-17:20:07] [V] [TRT] Constructing optimization profile number 0 out of 1
*************** Autotuning format combination:  -> Float(1,512,33280) ***************
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination:  -> Float(1,1,65) ***************
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,512,512), Float(1,512,1024), Float(1,512,1024), Int32(1) -> Float(1,512,512), Float(1,512,1024), Float(1,512,1024) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 0) [RNN] (RNNv2)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 0 is the only option, timing skipped
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,512,33280), Float(1,512,512) -> Float(1,1,65) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 2) [Matrix Multiply] (MatrixMultiply)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 0 is the only option, timing skipped
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,1,65), Float(1,1,65) -> Float(1,1,65) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 4) [ElementWise] (ElementWise)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 1 time 0.004832
[05/24/2021-17:20:07] [V] [TRT] Tactic: 2 time 0.006144
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 1 Time: 0.004832
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,1,65) -> Float(1,1,1), Int32(1,1,1) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 5) [TopK] (TopK)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 0 time 0.00688
[05/24/2021-17:20:07] [V] [TRT] Tactic: 1 time 0.012288
[05/24/2021-17:20:07] [V] [TRT] Tactic: 3 time 0.009952
[05/24/2021-17:20:07] [V] [TRT] Tactic: 2 time 0.026624
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0.00688
[05/24/2021-17:20:07] [V] [TRT] Formats and tactics selection completed in 0.0161162 seconds.
[05/24/2021-17:20:07] [V] [TRT] After reformat layers: 6 layers
[05/24/2021-17:20:07] [V] [TRT] Block size 33554432
[05/24/2021-17:20:07] [V] [TRT] Block size 2048
[05/24/2021-17:20:07] [V] [TRT] Block size 512
[05/24/2021-17:20:07] [V] [TRT] Total Activation Memory: 33556992
[05/24/2021-17:20:07] [I] [TRT] Detected 4 inputs and 3 output network tensors.
[05/24/2021-17:20:07] [V] [TRT] Engine generation completed in 2.03795 seconds.
[05/24/2021-17:20:07] [V] [TRT] Engine Layer Information:
[05/24/2021-17:20:07] [V] [TRT] Layer(Constant): (Unnamed Layer* 1) [Constant], Tactic: 0,  -> (Unnamed Layer* 1) [Constant]_output[Float(65,512)]
[05/24/2021-17:20:07] [V] [TRT] Layer(Constant): (Unnamed Layer* 3) [Constant], Tactic: 0,  -> (Unnamed Layer* 3) [Constant]_output[Float(65,1)]
[05/24/2021-17:20:07] [V] [TRT] Layer(RNN): (Unnamed Layer* 0) [RNN], Tactic: 0, data[Float(1,512)], hiddenIn[Float(2,512)], cellIn[Float(2,512)], seqLen[Int32()] -> RNN output[Float(1,512)], hiddenOut[Float(2,512)], cellOut[Float(2,512)]
[05/24/2021-17:20:07] [V] [TRT] Layer(MatrixMultiply): (Unnamed Layer* 2) [Matrix Multiply], Tactic: 0, (Unnamed Layer* 1) [Constant]_output[Float(65,512)], RNN output[Float(1,512)] -> Matrix Multiplicaton output[Float(65,1)]
[05/24/2021-17:20:07] [V] [TRT] Layer(ElementWise): (Unnamed Layer* 4) [ElementWise], Tactic: 1, Matrix Multiplicaton output[Float(65,1)], (Unnamed Layer* 3) [Constant]_output[Float(65,1)] -> Add Bias output[Float(65,1)]
[05/24/2021-17:20:07] [V] [TRT] Layer(TopK): (Unnamed Layer* 5) [TopK], Tactic: 0, Add Bias output[Float(65,1)] -> (Unnamed Layer* 5) [TopK]_output_1[Float(1,1)], pred[Int32(1,1)]
[05/24/2021-17:20:07] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[05/24/2021-17:20:07] [I] RNN warmup sentence: JACK
[05/24/2021-17:20:07] [I] Expected output: INGHAM:
What shall I
[05/24/2021-17:20:07] [I] Received: INGHAM:
What shall I
&&&& PASSED TensorRT.sample_char_rnn # ./sample_char_rnn

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

参考：

https://zhuanlan.zhihu.com/p/346307138

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/blog/article/detail/57598

docker编译TensorRT源码_autotuning format combination

docker编译TensorRT源码

1. 根据cuda，cudnn版本选择合适的源码

2. 下载下载TensorRT的核心库（tar包）

3.编译TensorRT源码

4. 测试

Docker | 使用DockerCompose

Docker 教程_启动docker

学会Docker之——界面化操作(Docker Desktop)_docker desktop 教程

Docker | 使用Dockerfile制作镜像

docker run 命令30个常用参数详解_docker run参数

Docker：如何删除已存在的镜像_docker 删除镜像

【2023最新版】Win11: WSL（Ubuntu22.04）使用docker远程容器教程（Windows的Docker Desktop下载安装、迁移到非系统盘、配置国内镜像源、设置 WSL2）_wsl docker

【已解决】Linux中启动docker 出现 ‘ Failed to start docker.service: Unit not found. ’ 错误_failed to start docker.service: unit docker.socket

IDEA制作docker镜像推送到docker hub和阿里云镜像仓库_idea docker 阿里云

Docker AOSP `GLIBC_2.33‘ not found

Docker CentOS 安装要求_docker-1.20.17

【Linux】Centos 8 服务器部署：docker 安装 jdk、nginx、nacos、redis、Sentinel Dashboard_centos8 jdk

Docker、 Kubernetes 容器以及Hypervisor的区别_hypervisor docker

Docker-数据卷&&网络

第一讲：如何使用go-client连接k8s_go-client windows docker k8s如何用代码连接

ES-部署（docker-compose）_docker-compose 部署es

【Docker】Docker学习⑧ - Docker仓库之分布式Harbor

Docker Compose 容器编排 + Docker--harbor私有仓库部署与管理_docker harbor docker-compose

【docker基础】使用Harbor搭建私有仓库-docker-compose使用示例--第二周作业_docker compose 安装harbor

【docker系列】使用docker-compose安装私有镜像仓库Harbor_harbor docker-compose