赞
踩
目前官方的转换工具 ONNX-TensorRT https://github.com/onnx/onnx-tensorrt
trtexec的用法说明参考 https://blog.csdn.net/qq_29007291/article/details/116135737
trtexec有两个主要用途:
测试网络性能 - 如果您将模型保存为 UFF 文件、ONNX 文件,或者如果您有 Caffe prototxt 格式的网络描述,您可以使用 trtexec 工具来测试推理的性能。 注意如果只使用 Caffe prototxt 文件并且未提供模型,则会生成随机权重。trtexec 工具有许多选项用于指定输入和输出、性能计时的迭代、允许的精度等。序列化引擎生成 - 可以将UFF、ONNX、Caffe格式的模型构建成engine。
生成engine
- #生成engine
- ./trtexec --deploy=/path/to/mnist.prototxt \ #指定网络模型文件,caffe独有的
- --model=/path/to/mnist.caffemodel \ #指定权重文件
- --output=prob \ #标记输出节点名称(可以多次指定)
- --batch=16 \ #为隐式批处理引擎设置批处理大小
- --saveEngine=mnist16.trt #输出engine
-
- #生成engine启用INT8精度
- ./trtexec --deploy=GoogleNet_N2.prototxt \ #指定网络模型文件,caffe独有的
- --output=prob \ #标记输出节点名称(可以多次指定)
- --batch=1 \ #为隐式批处理引擎设置批处理大小,默认=1
- --saveEngine=g1.trt \ #输出engine
- --int8 \ #除了fp32之外,还启用int8精度(默认=禁用)
- --buildOnly #跳过性能测量
测试网络
- #使用engine进行性能测试
- ./trtexec --loadEngine=mnist16.trt --batch=16
-
- #在 FP16 模式下在 NVIDIA DLA(深度学习加速器)上运行 AlexNet 网络
- ./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt \ #指定网络模型文件,caffe独有的
- --output=prob \ #标记输出节点名称(可以多次指定)
- --useDLACore=1 \ #使用NVIDIA DLA(深度学习加速器)
- --fp16 \ #除了fp32之外,还启用fp16精度(默认=禁用)
- --allowGPUFallback #启用DLA时,允许GPU回退不支持的层(默认值=禁用)
-
- #在 INT8 模式下在 DLA 上运行 AlexNet 网络
- ./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt \ #指定网络模型文件,caffe独有的
- --output=prob \ #标记输出节点名称(可以多次指定)
- --useDLACore=1 \ #使用NVIDIA DLA(深度学习加速器)
- --int8 \ #除了fp32之外,还启用int8精度(默认=禁用)
- --allowGPUFallback #启用DLA时,允许GPU回退不支持的层(默认值=禁用)
-
- #trtexec测试模型并打印测量的性能,并将计时结果写入json文件
- ./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt \ #指定网络模型文件,caffe独有的
- --output=prob \ #标记输出节点名称(可以多次指定)
- --exportTimes=trace.json #将计时结果写入json文件(默认=禁用)
-
- #通过多流调整吞吐量
- ./trtexec --loadEngine=g1.trt --batch=1 --streams=2
- ./trtexec --loadEngine=g1.trt --batch=1 --streams=3
- ./trtexec --loadEngine=g1.trt --batch=1 --streams=4
- ./trtexec --loadEngine=g2.trt --batch=2 --streams=2
- #生成静态batchsize的engine
- ./trtexec --onnx=<onnx_file> \ #指定onnx模型文件
- --explicitBatch \ #在构建引擎时使用显式批大小(默认=隐式)显示批处理
- --saveEngine=<tensorRT_engine_file> \ #输出engine
- --workspace=<size_in_megabytes> \ #设置工作空间大小单位是MB(默认为16MB)
- --fp16 #除了fp32之外,还启用fp16精度(默认=禁用)
-
- #生成动态batchsize的engine
- ./trtexec --onnx=<onnx_file> \ #指定onnx模型文件
- --minShapes=input:<shape_of_min_batch> \ #最小的batchsize x 通道数 x 输入尺寸x x 输入尺寸y
- --optShapes=input:<shape_of_opt_batch> \ #最佳输入维度,跟maxShapes一样就好
- --maxShapes=input:<shape_of_max_batch> \ #最大输入维度
- --workspace=<size_in_megabytes> \ #设置工作空间大小单位是MB(默认为16MB)
- --saveEngine=<engine_file> \ #输出engine
- --fp16 #除了fp32之外,还启用fp16精度(默认=禁用)
举例:
- #小尺寸的图片可以多batchsize即8x3x416x416
- /home/zxl/TensorRT-7.2.3.4/bin/trtexec --onnx=yolov4_-1_3_416_416_dynamic.onnx \
- --minShapes=input:1x3x416x416 \
- --optShapes=input:8x3x416x416 \
- --maxShapes=input:8x3x416x416 \
- --workspace=4096 \
- --saveEngine=yolov4_-1_3_416_416_dynamic_b8_fp16.engine \
- --fp16
-
- #由于内存不够了所以改成4x3x608x608
- /home/zxl/TensorRT-7.2.3.4/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx \
- --minShapes=input:1x3x608x608 \
- --optShapes=input:4x3x608x608 \
- --maxShapes=input:4x3x608x608 \
- --workspace=4096 \
- --saveEngine=yolov4_-1_3_608_608_dynamic_b4_fp16.engine \
- --fp16
生成engine得到同时也包含了测试性能的信息:
- (base) zxl@R7000P:~/TensorRT-7.2.3.4/bin$ ./trtexec --help
- &&&& RUNNING TensorRT.trtexec # ./trtexec --help
- === Model Options ===
- --uff=<file> UFF model
- --onnx=<file> ONNX model
- --model=<file> Caffe model (default = no model, random weights used)
- --deploy=<file> Caffe prototxt file
- --output=<name>[,<name>]* Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
- --uffInput=<name>,X,Y,Z Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
- --uffNHWC Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)
-
- === Build Options ===
- --maxBatch Set max batch size and build an implicit batch engine (default = 1)
- --explicitBatch Use explicit batch sizes when building the engine (default = implicit)
- --minShapes=spec Build with dynamic shapes using a profile with the min shapes provided
- --optShapes=spec Build with dynamic shapes using a profile with the opt shapes provided
- --maxShapes=spec Build with dynamic shapes using a profile with the max shapes provided
- --minShapesCalib=spec Calibrate with dynamic shapes using a profile with the min shapes provided
- --optShapesCalib=spec Calibrate with dynamic shapes using a profile with the opt shapes provided
- --maxShapesCalib=spec Calibrate with dynamic shapes using a profile with the max shapes provided
- Note: All three of min, opt and max shapes must be supplied.
- However, if only opt shapes is supplied then it will be expanded so
- that min shapes and max shapes are set to the same values as opt shapes.
- In addition, use of dynamic shapes implies explicit batch.
- Input names can be wrapped with escaped single quotes (ex: \'Input:0\').
- Example input shapes spec: input0:1x3x256x256,input1:1x3x128x128
- Each input shape is supplied as a key-value pair where key is the input name and
- value is the dimensions (including the batch dimension) to be used for that input.
- Each key-value pair has the key and value separated using a colon (:).
- Multiple input shapes can be provided via comma-separated key-value pairs.
- --inputIOFormats=spec Type and format of each of the input tensors (default = all inputs in fp32:chw)
- See --outputIOFormats help for the grammar of type and format list.
- Note: If this option is specified, please set comma-separated types and formats for all
- inputs following the same order as network inputs ID (even if only one input
- needs specifying IO format) or set the type and format once for broadcasting.
- --outputIOFormats=spec Type and format of each of the output tensors (default = all outputs in fp32:chw)
- Note: If this option is specified, please set comma-separated types and formats for all
- outputs following the same order as network outputs ID (even if only one output
- needs specifying IO format) or set the type and format once for broadcasting.
- IO Formats: spec ::= IOfmt[","spec]
- IOfmt ::= type:fmt
- type ::= "fp32"|"fp16"|"int32"|"int8"
- fmt ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8")["+"fmt]
- --workspace=N Set workspace size in megabytes (default = 16)
- --noBuilderCache Disable timing cache in builder (default is to enable timing cache)
- --nvtxMode=mode Specify NVTX annotation verbosity. mode ::= default|verbose|none
- --minTiming=M Set the minimum number of iterations used in kernel selection (default = 1)
- --avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = 8)
- --noTF32 Disable tf32 precision (default is to enable tf32, in addition to fp32)
- --refit Mark the engine as refittable. This will allow the inspection of refittable layers
- and weights within the engine.
- --fp16 Enable fp16 precision, in addition to fp32 (default = disabled)
- --int8 Enable int8 precision, in addition to fp32 (default = disabled)
- --best Enable all precisions to achieve the best performance (default = disabled)
- --calib=<file> Read INT8 calibration cache file
- --safe Only test the functionality available in safety restricted flows
- --saveEngine=<file> Save the serialized engine
- --loadEngine=<file> Load a serialized engine
- --tacticSources=tactics Specify the tactics to be used by adding (+) or removing (-) tactics from the default
- tactic sources (default = all available tactics).
- Note: Currently only cuBLAS and cuBLAS LT are listed as optional tactics.
- Tactic Sources: tactics ::= [","tactic]
- tactic ::= (+|-)lib
- lib ::= "cublas"|"cublasLt"
-
- === Inference Options ===
- --batch=N Set batch size for implicit batch engines (default = 1)
- --shapes=spec Set input shapes for dynamic shapes inference inputs.
- Note: Use of dynamic shapes implies explicit batch.
- Input names can be wrapped with escaped single quotes (ex: \'Input:0\').
- Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128
- Each input shape is supplied as a key-value pair where key is the input name and
- value is the dimensions (including the batch dimension) to be used for that input.
- Each key-value pair has the key and value separated using a colon (:).
- Multiple input shapes can be provided via comma-separated key-value pairs.
- --loadInputs=spec Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
- Input values spec ::= Ival[","spec]
- Ival ::= name":"file
- --iterations=N Run at least N inference iterations (default = 10)
- --warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200)
- --duration=N Run performance measurements for at least N seconds wallclock time (default = 3)
- --sleepTime=N Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
- --streams=N Instantiate N engines to use concurrently (default = 1)
- --exposeDMA Serialize DMA transfers to and from device. (default = disabled)
- --noDataTransfers Do not transfer data to and from the device during inference. (default = disabled)
- --useSpinWait Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
- --threads Enable multithreading to drive engines with independent threads (default = disabled)
- --useCudaGraph Use cuda graph to capture engine execution and then launch inference (default = disabled)
- --separateProfileRun Do not attach the profiler in the benchmark run; if profiling is enabled, a second profile run will be executed (default = disabled)
- --buildOnly Skip inference perf measurement (default = disabled)
-
- === Build and Inference Batch Options ===
- When using implicit batch, the max batch size of the engine, if not given,
- is set to the inference batch size;
- when using explicit batch, if shapes are specified only for inference, they
- will be used also as min/opt/max in the build profile; if shapes are
- specified only for the build, the opt shapes will be used also for inference;
- if both are specified, they must be compatible; and if explicit batch is
- enabled but neither is specified, the model must provide complete static
- dimensions, including batch size, for all inputs
-
- === Reporting Options ===
- --verbose Use verbose logging (default = false)# 使用详细日志记录
- --avgRuns=N Report performance measurements averaged over N consecutive iterations (default = 10)
- --percentile=P Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)
- --dumpRefit Print the refittable layers and weights from a refittable engine
- --dumpOutput Print the output tensor(s) of the last inference iteration (default = disabled)
- --dumpProfile Print profile information per layer (default = disabled)
- --exportTimes=<file> Write the timing results in a json file (default = disabled)
- --exportOutput=<file> Write the output tensors to a json file (default = disabled)
- --exportProfile=<file> Write the profile information per layer in a json file (default = disabled)
-
- === System Options ===
- --device=N Select cuda device N (default = 0)
- --useDLACore=N Select DLA core N for layers that support DLA (default = none)
- --allowGPUFallback When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
- --plugins Plugin library (.so) to load (can be specified multiple times)
-
- === Help ===
- --help, -h Print this message
————————————————
Thanks to:https://blog.csdn.net/weixin_41562691/article/details/118277574
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。