赞
踩
TensorRT源码地址:https://github.com/NVIDIA/TensorRT/tree/20.03
cuda10.2+cuda7.6.5: TensorRT最高支持的是TensoRT7.0.0
TensorRT7.1以上均需要cudnn8.0以上,找到合适的源码版本
git clone -b master https://github.com/nvidia/TensorRT TensorRT -b release/7.0
cd TensorRT
git submodule update --init --recursive
https://developer.nvidia.com/nvidia-tensorrt-7x-download
# 解压
tar -xvzf TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz
# 将解压得到的TensorRT-7.0.0.11,复制到上面下载好的源码TensoRT
# 建议把复制好的代码,一并复制到创建的/opt/TensorRT7
# 进入复制到的代码路径,/opt/TensorRT7/TensorRT
export TRT_SOURCE=`pwd`
export TRT_RELEASE=`pwd`/TensorRT-7.0.0.11
export TENSORRT_LIBRARY_INFER=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libnvinfer.so.7
export TENSORRT_LIBRARY_INFER_PLUGIN=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7
export TENSORRT_LIBRARY_MYELIN=$TRT_RELEASE/targets/x86_64-linux-gnu/lib/libmyelin.so
在正式编译之前,查看TensorRT/Cmakelist.txt中的CUDA和CUDNN版本是否正确。
执行编译
# gcc,g++版本不能高于8,ubuntu20.04是gcc9, 需要降级
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_RELEASE/lib -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)
出现如下报错
-- Using src='https://github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz'
CMake Error at third_party.protobuf-stamp/download-third_party.protobuf.cmake:159 (message):
Each download failed!
error: downloading 'https://github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz' failed
status_code: 1
status_string: "Unsupported protocol"
log:
--- LOG BEGIN ---
Protocol "https" not supported or disabled in libcurl
Closing connection -1
解决:
(1)下载依赖包:https://link.zhihu.com/?target=https%3A//github.com/google/protobuf/releases/download/v3.0.0/protobuf-cpp-3.0.0.tar.gz
(2)将protobuf-cpp-3.0.0.tar.gz复制到项目的**/build/third_party.protobuf/src/**路径下;
(3)修改下载文件
打开项目路径下的/build/third_party.protobuf/src/third_party.protobuf-stamp/download-third_party.protobuf.cmake
将个**if(EXISTS)**一直到该文件最后一行全部删掉;
进行完上述三步之后,重新执行命令:make -j$(nproc),继续开始编译,如果顺畅的话应该能全部编译完成
进入/TensorRT-7.0.0.11/python# ,安装tensorrt的python接口
root@53737c0fc560:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/python# ls
tensorrt-7.0.0.11-cp27-none-linux_x86_64.whl tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl
tensorrt-7.0.0.11-cp34-none-linux_x86_64.whl tensorrt-7.0.0.11-cp37-none-linux_x86_64.whl
tensorrt-7.0.0.11-cp35-none-linux_x86_64.whl
python3.6 -m pip install tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl
设置环境变量:
>>> import tensorrt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorrt/__init__.py", line 1, in <module>
from .tensorrt import *
ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT7/TensorRT/build/
root@8069c274e477:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/python# python3
Python 3.7.5 (default, Nov 7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt as trt
>>> print(trt.__version__)
7.0.0.11
>>>
安装其他python下面的whl文件
python3.7 -m pip install graphsurgeon-0.4.1-py2.py3-none-any.whl
python3 -m pip install graphsurgeon-0.4.1-py2.py3-none-any.whl
python3 -m pip install pycuda
如果是docker部署,可以把TensorRT源码和tar包,放到opt路径,重新编译,这样就不依赖docker外的文件
cd /opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/samples # 编译samples CUDNN_INSTALL_DIR=/usr/local/cuda/ make -j8 # 生成的可执行文件在/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin root@50a0f0e0e2a0:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin# ls chobj sample_googlenet sample_mnist_api_debug sample_plugin sample_uff_mnist common sample_googlenet_debug sample_mnist_debug sample_plugin_debug sample_uff_mnist_debug dchobj sample_int8 sample_movielens sample_reformat_free_io sample_uff_plugin_v2_ext giexec sample_int8_api sample_movielens_debug sample_reformat_free_io_debug sample_uff_plugin_v2_ext_debug sample_char_rnn sample_int8_api_debug sample_movielens_mps sample_ssd sample_uff_ssd sample_char_rnn_debug sample_int8_debug sample_movielens_mps_debug sample_ssd_debug sample_uff_ssd_debug sample_dynamic_reshape sample_mlp sample_nmt sample_uff_faster_rcnn trtexec sample_dynamic_reshape_debug sample_mlp_debug sample_nmt_debug sample_uff_faster_rcnn_debug trtexec_debug sample_fasterRCNN sample_mnist sample_onnx_mnist sample_uff_mask_rcnn sample_fasterRCNN_debug sample_mnist_api sample_onnx_mnist_debug sample_uff_mask_rcnn_debug
运行测试:
root@50a0f0e0e2a0:/opt/TensorRT7/TensorRT/TensorRT-7.0.0.11/bin# ./sample_char_rnn &&&& RUNNING TensorRT.sample_char_rnn # ./sample_char_rnn [05/24/2021-17:20:05] [I] Building and running a GPU inference engine for Char RNN model... [05/24/2021-17:20:05] [I] Done reading weights from file... [05/24/2021-17:20:05] [I] Done constructing network... [05/24/2021-17:20:05] [V] [TRT] Applying generic optimizations to the graph for inference. [05/24/2021-17:20:05] [V] [TRT] Original: 6 layers [05/24/2021-17:20:05] [V] [TRT] After dead-layer removal: 6 layers [05/24/2021-17:20:05] [V] [TRT] After Myelin optimization: 6 layers [05/24/2021-17:20:05] [V] [TRT] After scale fusion: 6 layers [05/24/2021-17:20:05] [V] [TRT] After vertical fusions: 6 layers [05/24/2021-17:20:05] [V] [TRT] After final dead-layer removal: 6 layers [05/24/2021-17:20:05] [V] [TRT] After tensor merging: 6 layers [05/24/2021-17:20:05] [V] [TRT] After concat removal: 6 layers [05/24/2021-17:20:05] [V] [TRT] Graph construction and optimization completed in 0.00174117 seconds. [05/24/2021-17:20:07] [V] [TRT] Constructing optimization profile number 0 out of 1 *************** Autotuning format combination: -> Float(1,512,33280) *************** [05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: -> Float(1,1,65) *************** [05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,512,512), Float(1,512,1024), Float(1,512,1024), Int32(1) -> Float(1,512,512), Float(1,512,1024), Float(1,512,1024) *************** [05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 0) [RNN] (RNNv2) [05/24/2021-17:20:07] [V] [TRT] Tactic: 0 is the only option, timing skipped [05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0 [05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,512,33280), Float(1,512,512) -> Float(1,1,65) *************** [05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 2) [Matrix Multiply] (MatrixMultiply) [05/24/2021-17:20:07] [V] [TRT] Tactic: 0 is the only option, timing skipped [05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0 [05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,1,65), Float(1,1,65) -> Float(1,1,65) *************** [05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 4) [ElementWise] (ElementWise) [05/24/2021-17:20:07] [V] [TRT] Tactic: 1 time 0.004832 [05/24/2021-17:20:07] [V] [TRT] Tactic: 2 time 0.006144 [05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 1 Time: 0.004832 [05/24/2021-17:20:07] [V] [TRT] *************** Autotuning format combination: Float(1,1,65) -> Float(1,1,1), Int32(1,1,1) *************** [05/24/2021-17:20:07] [V] [TRT] --------------- Timing Runner: (Unnamed Layer* 5) [TopK] (TopK) [05/24/2021-17:20:07] [V] [TRT] Tactic: 0 time 0.00688 [05/24/2021-17:20:07] [V] [TRT] Tactic: 1 time 0.012288 [05/24/2021-17:20:07] [V] [TRT] Tactic: 3 time 0.009952 [05/24/2021-17:20:07] [V] [TRT] Tactic: 2 time 0.026624 [05/24/2021-17:20:07] [V] [TRT] Fastest Tactic: 0 Time: 0.00688 [05/24/2021-17:20:07] [V] [TRT] Formats and tactics selection completed in 0.0161162 seconds. [05/24/2021-17:20:07] [V] [TRT] After reformat layers: 6 layers [05/24/2021-17:20:07] [V] [TRT] Block size 33554432 [05/24/2021-17:20:07] [V] [TRT] Block size 2048 [05/24/2021-17:20:07] [V] [TRT] Block size 512 [05/24/2021-17:20:07] [V] [TRT] Total Activation Memory: 33556992 [05/24/2021-17:20:07] [I] [TRT] Detected 4 inputs and 3 output network tensors. [05/24/2021-17:20:07] [V] [TRT] Engine generation completed in 2.03795 seconds. [05/24/2021-17:20:07] [V] [TRT] Engine Layer Information: [05/24/2021-17:20:07] [V] [TRT] Layer(Constant): (Unnamed Layer* 1) [Constant], Tactic: 0, -> (Unnamed Layer* 1) [Constant]_output[Float(65,512)] [05/24/2021-17:20:07] [V] [TRT] Layer(Constant): (Unnamed Layer* 3) [Constant], Tactic: 0, -> (Unnamed Layer* 3) [Constant]_output[Float(65,1)] [05/24/2021-17:20:07] [V] [TRT] Layer(RNN): (Unnamed Layer* 0) [RNN], Tactic: 0, data[Float(1,512)], hiddenIn[Float(2,512)], cellIn[Float(2,512)], seqLen[Int32()] -> RNN output[Float(1,512)], hiddenOut[Float(2,512)], cellOut[Float(2,512)] [05/24/2021-17:20:07] [V] [TRT] Layer(MatrixMultiply): (Unnamed Layer* 2) [Matrix Multiply], Tactic: 0, (Unnamed Layer* 1) [Constant]_output[Float(65,512)], RNN output[Float(1,512)] -> Matrix Multiplicaton output[Float(65,1)] [05/24/2021-17:20:07] [V] [TRT] Layer(ElementWise): (Unnamed Layer* 4) [ElementWise], Tactic: 1, Matrix Multiplicaton output[Float(65,1)], (Unnamed Layer* 3) [Constant]_output[Float(65,1)] -> Add Bias output[Float(65,1)] [05/24/2021-17:20:07] [V] [TRT] Layer(TopK): (Unnamed Layer* 5) [TopK], Tactic: 0, Add Bias output[Float(65,1)] -> (Unnamed Layer* 5) [TopK]_output_1[Float(1,1)], pred[Int32(1,1)] [05/24/2021-17:20:07] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles [05/24/2021-17:20:07] [I] RNN warmup sentence: JACK [05/24/2021-17:20:07] [I] Expected output: INGHAM: What shall I [05/24/2021-17:20:07] [I] Received: INGHAM: What shall I &&&& PASSED TensorRT.sample_char_rnn # ./sample_char_rnn
参考:
https://zhuanlan.zhihu.com/p/346307138
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
赞
踩
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。