赞
踩
这篇文章主要讲的是基于官方镜像及, pytorch script 格式模型,构建tritonserver 服务
1.1. 下载 tritonserver镜像: Triton Inference Server | NVIDIA NGC
1.2. 然后,拉取Pytorch官方镜像作为推理系统的客户端同时进行一些预处理操作(当然也可以直接拉取tritonserver客户端SDK镜像)。
docker pull pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
暂时无法提供下载链接,因为无法访问dockerhub。# nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
# docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
1.3. 接下来,基于官方Pytorch镜像创建一个容器客户端。
pip install datasets transformers -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
pip install tritonclient[all] -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
import torch
import torchvision.models as models
resnet50 = models.resnet50(pretrained=True)
resnet50.eval()
image = torch.randn(1, 3, 244, 244)
resnet50_traced = torch.jit.trace(resnet50, image)
resnet50(image)
# resnet50_traced.save('/workspace/model/resnet50/model.pt')
torch.jit.save(resnet50_traced, "/workspace/model/resnet50/model.pt")
2 2. 最后,拉取Triton Server 代码库。
git clone -b r23.04 https://github.com/triton-inference-server/server.git
一些常见后端backend的配置都在server/docs/examples
目录下。
tree docs/examples -L 2 docs/examples |-- README.md |-- fetch_models.sh |-- jetson | |-- README.md | `-- concurrency_and_dynamic_batching `-- model_repository |-- densenet_onnx |-- inception_graphdef |-- simple |-- simple_dyna_sequence |-- simple_identity |-- simple_int8 |-- simple_sequence `-- simple_string 11 directories, 3 files
2.3. 拉取Triton Tutorials库,该仓库中包含Triton的教程和样例,本文使用Quick_Deploy/PyTorch
下部署一个Pytorch模型进行讲解。
git clone https://github.com/triton-inference-server/tutorials.git
model_repository/
`-- resnet50
|-- 1
| `-- model.pt
`-- config.pbtxt
其中, config.pbtxt
是模型配置文件; 1
表示模型版本号; resnet50
表示模型名,需要与config.pbtxt
文件中的name
字段保存一致;model.pt
为模型权重(即上面转换后的模型权重)。
config.pbtxt
文件,具体内容如下所示:name: "resnet50" platform: "pytorch_libtorch" max_batch_size : 0 input [ { name: "input__0" data_type: TYPE_FP32 dims: [ 3, 224, 224 ] reshape { shape: [ 1, 3, 224, 224 ] } } ] output [ { name: "output__0" data_type: TYPE_FP32 dims: [ 1, 1000 ,1, 1] reshape { shape: [ 1, 1000 ] } } ]
重要字段说明如下:
模型仓库构建好之后,接下来启动Triton推理服务端。
启动推理服务启动服务的方法有两种:一种是用 docker 启动并执行命令,一种是进入 docker 中然后手动调用命令。
我们在这里使用docker启动并执行命令:
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /D/chinasoft/shumei/triton/demo_first/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models
参数说明:
(base) PS C:\Users\lenovo> docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /D/chinasoft/shumei/triton/demo_first/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models ============================= == Triton Inference Server == ============================= NVIDIA Release 22.12 (build 50109463) Triton Server Version 2.29.0 Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license WARNING: CUDA Minor Version Compatibility mode ENABLED. Using driver version 516.94 which has support for CUDA 11.7. This container was built with CUDA 11.8 and will be run in Minor Version Compatibility mode. CUDA Forward Compatibility is preferred over Minor Version Compatibility for use with this container but was unavailable: [[]] See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. I0804 01:46:15.003883 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x304800000' with size 268435456 I0804 01:46:15.004050 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0804 01:46:15.322720 1 model_lifecycle.cc:459] loading: resnet50:1 I0804 01:46:17.472054 1 libtorch.cc:1985] TRITONBACKEND_Initialize: pytorch I0804 01:46:17.472105 1 libtorch.cc:1995] Triton TRITONBACKEND API version: 1.10 I0804 01:46:17.472587 1 libtorch.cc:2001] 'pytorch' TRITONBACKEND API version: 1.10 I0804 01:46:17.472634 1 libtorch.cc:2034] TRITONBACKEND_ModelInitialize: resnet50 (version 1) W0804 01:46:17.473291 1 libtorch.cc:284] skipping model configuration auto-complete for 'resnet50': not supported for pytorch backend I0804 01:46:17.473618 1 libtorch.cc:313] Optimized execution is enabled for model instance 'resnet50' I0804 01:46:17.473624 1 libtorch.cc:332] Cache Cleaning is disabled for model instance 'resnet50' I0804 01:46:17.473626 1 libtorch.cc:349] Inference Mode is disabled for model instance 'resnet50' I0804 01:46:17.473640 1 libtorch.cc:444] NvFuser is not specified for model instance 'resnet50' I0804 01:46:17.473699 1 libtorch.cc:2078] TRITONBACKEND_ModelInstanceInitialize: resnet50 (GPU device 0) I0804 01:46:22.750763 1 model_lifecycle.cc:694] successfully loaded 'resnet50' version 1 I0804 01:46:22.750870 1 server.cc:563] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0804 01:46:22.750917 1 server.cc:590] +---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} | +---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0804 01:46:22.750948 1 server.cc:633] +----------+---------+--------+ | Model | Version | Status | +----------+---------+--------+ | resnet50 | 1 | READY | +----------+---------+--------+ I0804 01:46:22.810861 1 metrics.cc:864] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650 I0804 01:46:22.811494 1 metrics.cc:757] Collecting CPU metrics I0804 01:46:22.811657 1 tritonserver.cc:2264] +----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.29.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging | | model_repository_path[0] | /models | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0804 01:46:22.813086 1 grpc_server.cc:4819] Started GRPCInferenceService at 0.0.0.0:8001 I0804 01:46:22.813243 1 http_server.cc:3477] Started HTTPService at 0.0.0.0:8000 I0804 01:46:22.890915 1 http_server.cc:184] Started Metrics Service at 0.0.0.0:8002 W0804 01:46:23.822499 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000 W0804 01:46:24.822769 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000 W0804 01:46:25.831221 1 metrics.cc:603] Unable to get power limit for GPU 0. Status:Success, value:0.000000
可以看到resnet50模型已经 READY状态了,但是显卡没有用到,因为上面报警我宿主 机驱动版本和镜像驱动版本不匹配
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 516.94 which has support for CUDA 11.7. This container
was built with CUDA 11.8 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
import numpy as np from torchvision import transforms from PIL import Image import tritonclient.http as httpclient from tritonclient.utils import triton_to_np_dtype # 图片预处理 # preprocessing function def rn50_preprocess(img_path="img1.jpg"): img = Image.open(img_path) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) return preprocess(img).numpy() transformed_img = rn50_preprocess() # 设置连接到Triton服务端 # Setting up client client = httpclient.InferenceServerClient(url="localhost:8000") # 指定resnet50模型的输入和输出 inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32") inputs.set_data_from_numpy(transformed_img, binary_data=True) # class_count表示获取 TopK 分类预测结果。如果没有设置这个选项,默认值为0,那么将会得到一个 1000 维的向量。 outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000) # 发送一个推理请求到Triton服务端 # Querying the server results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs]) inference_output = results.as_numpy('output__0') print(inference_output[:5])
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python client.py
[b'12.474869:90' b'11.527128:92' b'9.659309:14' b'8.408504:136'
b'8.216769:11']
输出的格式为<confidence_score>:<classification_index>。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。