当前位置:   article > 正文

docker编译TensorRT源码_autotuning format combination

autotuning format combination
docker编译TensorRT源码
1. 根据cuda,cudnn版本选择合适的源码

TensorRT源码地址:https://github.com/NVIDIA/TensorRT/tree/20.03
cuda10.2+cuda7.6.5: TensorRT最高支持的是TensoRT7.0.0
TensorRT7.1以上均需要cudnn8.0以上,找到合适的源码版本

git clone -b master https://github.com/nvidia/TensorRT TensorRT -b release/7.0
cd TensorRT
git submodule update --init --recursive
  • 1
  • 2
  • 3
2. 下载下载TensorRT的核心库(tar包)

https://developer.nvidia.com/nvidia-tensorrt-7x-download在这里插入图片描述

# 解压
tar -xvzf TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz
# 将解压得到的TensorRT-7.0.0.11,复制到上面下载好的源码TensoRT
# 建议把复制好的代码,一并复制到创建的/opt/TensorRT7
  • 1
  • 2
  • 3
  • 4
3.编译TensorRT源码
# 进入复制到的代码路径,/opt/TensorRT7/TensorRT
export TRT_SOURCE=`pwd`
export TRT_RELEASE=`pwd`/TensorRT-7.0.0.11
export TENSORRT_LIBRARY_INFER=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libnvinfer.so.7
export TENSORRT_LIBRARY_INFER_PLUGIN=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7
export TENSORRT_LIBRARY_MYELIN=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libmyelin.so
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

在正式编译之前,查看TensorRT/Cmakelist.txt中的CUDA和CUDNN版本是否正确。

执行编译

# gcc,g++版本不能高于8,ubuntu20.04是gcc9, 需要降级
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_RELEASE/lib -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)
  • 1
  • 2
  • 3
  • 4

出现如下报错

-- Using src='https://github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz'
CMake Error at third_party.protobuf-stamp/download-third_party.protobuf.cmake:159 (message):
  Each download failed!

    error: downloading 'https://github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz' failed
         status_code: 1
         status_string: "Unsupported protocol"
         log:
         --- LOG BEGIN ---
         Protocol "https" not supported or disabled in libcurl

  Closing connection -1

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

解决:

(1)下载依赖包:https://link.zhihu.com/?target=https%3A//github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz

(2)将protobuf-cpp-3.0.0.tar.gz复制到项目的**/build/third_party.protobuf/src/**路径下;

(3)修改下载文件

打开项目路径下的/build/third_party.protobuf/src/third_party.protobuf-stamp/download-third_party.protobuf.cmake

将个**if(EXISTS)**一直到该文件最后一行全部删掉;

在这里插入图片描述

进行完上述三步之后,重新执行命令:make -j$(nproc),继续开始编译,如果顺畅的话应该能全部编译完成

4. 测试

进入/TensorRT-7.0.0.11/python# ,安装tensorrt的python接口

root@53737c0fc560:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/python# ls
tensorrt-7.0.0.11-cp27-none-linux_x86_64.whl  tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl
tensorrt-7.0.0.11-cp34-none-linux_x86_64.whl  tensorrt-7.0.0.11-cp37-none-linux_x86_64.whl
tensorrt-7.0.0.11-cp35-none-linux_x86_64.whl

python3.6 -m pip install tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

设置环境变量:

>>> import tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorrt/__init__.py", line 1, in <module>
    from .tensorrt import *
ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory


export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT7/TensorRT/build/

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
root@8069c274e477:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/python# python3
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt as trt
>>> print(trt.__version__)
7.0.0.11
>>>

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

安装其他python下面的whl文件

 python3.7 -m pip install graphsurgeon-0.4.1-py2.py3-none-any.whl
 python3 -m pip install graphsurgeon-0.4.1-py2.py3-none-any.whl
 python3 -m pip install pycuda
  • 1
  • 2
  • 3

如果是docker部署,可以把TensorRT源码和tar包,放到opt路径,重新编译,这样就不依赖docker外的文件

cd /opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/samples
# 编译samples
CUDNN_INSTALL_DIR=/usr/local/cuda/ make -j8
# 生成的可执行文件在/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin


root@50a0f0e0e2a0:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin# ls
chobj                         sample_googlenet        sample_mnist_api_debug      sample_plugin                  sample_uff_mnist
common                        sample_googlenet_debug  sample_mnist_debug          sample_plugin_debug            sample_uff_mnist_debug
dchobj                        sample_int8             sample_movielens            sample_reformat_free_io        sample_uff_plugin_v2_ext
giexec                        sample_int8_api         sample_movielens_debug      sample_reformat_free_io_debug  sample_uff_plugin_v2_ext_debug
sample_char_rnn               sample_int8_api_debug   sample_movielens_mps        sample_ssd                     sample_uff_ssd
sample_char_rnn_debug         sample_int8_debug       sample_movielens_mps_debug  sample_ssd_debug               sample_uff_ssd_debug
sample_dynamic_reshape        sample_mlp              sample_nmt                  sample_uff_faster_rcnn         trtexec
sample_dynamic_reshape_debug  sample_mlp_debug        sample_nmt_debug            sample_uff_faster_rcnn_debug   trtexec_debug
sample_fasterRCNN             sample_mnist            sample_onnx_mnist           sample_uff_mask_rcnn
sample_fasterRCNN_debug       sample_mnist_api        sample_onnx_mnist_debug     sample_uff_mask_rcnn_debug

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

运行测试:

root@50a0f0e0e2a0:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin# ./sample_char_rnn
&&&& RUNNING TensorRT.sample_char_rnn # ./sample_char_rnn
[05/24/2021-17:20:05] [I] Building and running a GPU inference engine for Char RNN model...
[05/24/2021-17:20:05] [I] Done reading weights from file...
[05/24/2021-17:20:05] [I] Done constructing network...
[05/24/2021-17:20:05] [V] [TRT] Applying generic optimizations to the graph for inference.
[05/24/2021-17:20:05] [V] [TRT] Original: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After dead-layer removal: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After Myelin optimization: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After scale fusion: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After vertical fusions: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After final dead-layer removal: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After tensor merging: 6 layers
[05/24/2021-17:20:05] [V] [TRT] After concat removal: 6 layers
[05/24/2021-17:20:05] [V] [TRT] Graph construction and optimization completed in 0.00174117 seconds.
[05/24/2021-17:20:07] [V] [TRT] Constructing optimization profile number 0 out of 1
*************** Autotuning format combination:  -> Float(1,512,33280) ***************
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination:  -> Float(1,1,65) ***************
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,512,512), Float(1,512,1024), Float(1,512,1024), Int32(1) -> Float(1,512,512), Float(1,512,1024), Float(1,512,1024) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 0) [RNN] (RNNv2)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 0 is the only option, timing skipped
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,512,33280), Float(1,512,512) -> Float(1,1,65) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 2) [Matrix Multiply] (MatrixMultiply)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 0 is the only option, timing skipped
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,1,65), Float(1,1,65) -> Float(1,1,65) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 4) [ElementWise] (ElementWise)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 1 time 0.004832
[05/24/2021-17:20:07] [V] [TRT] Tactic: 2 time 0.006144
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 1 Time: 0.004832
[05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,1,65) -> Float(1,1,1), Int32(1,1,1) ***************
[05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 5) [TopK] (TopK)
[05/24/2021-17:20:07] [V] [TRT] Tactic: 0 time 0.00688
[05/24/2021-17:20:07] [V] [TRT] Tactic: 1 time 0.012288
[05/24/2021-17:20:07] [V] [TRT] Tactic: 3 time 0.009952
[05/24/2021-17:20:07] [V] [TRT] Tactic: 2 time 0.026624
[05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0.00688
[05/24/2021-17:20:07] [V] [TRT] Formats and tactics selection completed in 0.0161162 seconds.
[05/24/2021-17:20:07] [V] [TRT] After reformat layers: 6 layers
[05/24/2021-17:20:07] [V] [TRT] Block size 33554432
[05/24/2021-17:20:07] [V] [TRT] Block size 2048
[05/24/2021-17:20:07] [V] [TRT] Block size 512
[05/24/2021-17:20:07] [V] [TRT] Total Activation Memory: 33556992
[05/24/2021-17:20:07] [I] [TRT] Detected 4 inputs and 3 output network tensors.
[05/24/2021-17:20:07] [V] [TRT] Engine generation completed in 2.03795 seconds.
[05/24/2021-17:20:07] [V] [TRT] Engine Layer Information:
[05/24/2021-17:20:07] [V] [TRT] Layer(Constant): (Unnamed Layer* 1) [Constant], Tactic: 0,  -> (Unnamed Layer* 1) [Constant]_output[Float(65,512)]
[05/24/2021-17:20:07] [V] [TRT] Layer(Constant): (Unnamed Layer* 3) [Constant], Tactic: 0,  -> (Unnamed Layer* 3) [Constant]_output[Float(65,1)]
[05/24/2021-17:20:07] [V] [TRT] Layer(RNN): (Unnamed Layer* 0) [RNN], Tactic: 0, data[Float(1,512)], hiddenIn[Float(2,512)], cellIn[Float(2,512)], seqLen[Int32()] -> RNN output[Float(1,512)], hiddenOut[Float(2,512)], cellOut[Float(2,512)]
[05/24/2021-17:20:07] [V] [TRT] Layer(MatrixMultiply): (Unnamed Layer* 2) [Matrix Multiply], Tactic: 0, (Unnamed Layer* 1) [Constant]_output[Float(65,512)], RNN output[Float(1,512)] -> Matrix Multiplicaton output[Float(65,1)]
[05/24/2021-17:20:07] [V] [TRT] Layer(ElementWise): (Unnamed Layer* 4) [ElementWise], Tactic: 1, Matrix Multiplicaton output[Float(65,1)], (Unnamed Layer* 3) [Constant]_output[Float(65,1)] -> Add Bias output[Float(65,1)]
[05/24/2021-17:20:07] [V] [TRT] Layer(TopK): (Unnamed Layer* 5) [TopK], Tactic: 0, Add Bias output[Float(65,1)] -> (Unnamed Layer* 5) [TopK]_output_1[Float(1,1)], pred[Int32(1,1)]
[05/24/2021-17:20:07] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[05/24/2021-17:20:07] [I] RNN warmup sentence: JACK
[05/24/2021-17:20:07] [I] Expected output: INGHAM:
What shall I
[05/24/2021-17:20:07] [I] Received: INGHAM:
What shall I
&&&& PASSED TensorRT.sample_char_rnn # ./sample_char_rnn

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61

参考:

https://zhuanlan.zhihu.com/p/346307138

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/blog/article/detail/57598
推荐阅读
相关标签
  

闽ICP备14008679号