赞
踩
Triton是什么?Triton是 NVIDIA 推出的 Inference Server,专门做 AI 模型的部署服务。客户端可以同伙HTTP/REST或gRPC的方式来请求服务,特性包括以下方面:
官方文档:https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
官网介绍:https://developer.nvidia.com/nvidia-triton-inference-server
Triton 支持源码安装和容器安装两种方式,这里我们以容器安装方式为例
docker pull nvcr.io/nvidia/tritonserver:20.09-py3
目前只能从nvidia官方下拉镜像,如果不会科学上网的话,速度会比较慢
下载好镜像之后,创建容器,格式如下
docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/example/model/repository:/models <docker image> tritonserver --model-repository=/models
其中, <docker image> 要改为nvcr.io/nvidia/tritonserver:20.09-py3
sudo docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/lthpc/workspace_zong/triton_server/repository:/models tritonserver:20.09-py3 tritonserver --model-repository=/models
如果运行时报错如下
- ERROR: This container was built for NVIDIA Driver Release 450.51 or later, but
- version 418.116.00 was detected and compatibility mode is UNAVAILABLE.
-
- [[CUDA Driver UNAVAILABLE (cuInit(0) returned 804)]]
则是因为显卡驱动版本低,需要升级,参考《NVIDIA之显卡驱动安装方法》升级一下
启动成功后输出如下
- =============================
- == Triton Inference Server ==
- =============================
-
- NVIDIA Release 20.09 (build 16016295)
-
- Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
-
- Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
- NVIDIA modifications are covered by the license terms that apply to the underlying
- project or file.
-
- I0927 03:36:55.184340 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
- I0927 03:36:55.190330 1 metrics.cc:193] GPU 0: TITAN V
- I0927 03:36:55.190594 1 server.cc:120] Initializing Triton Inference Server
- I0927 03:36:55.190606 1 server.cc:121] id: 'triton'
- I0927 03:36:55.190612 1 server.cc:122] version: '2.3.0'
- I0927 03:36:55.190618 1 server.cc:128] extensions: classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics
- I0927 03:36:55.507614 1 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7efe38000000' with size 268435456
- I0927 03:36:55.509141 1 cuda_memory_manager.cc:98] CUDA memory pool is created on device 0 with size 67108864
- I0927 03:36:55.515347 1 grpc_server.cc:3897] Started GRPCInferenceService at 0.0.0.0:8001
- I0927 03:36:55.515670 1 http_server.cc:2705] Started HTTPService at 0.0.0.0:8000
- I0927 03:36:55.556973 1 http_server.cc:2724] Started Metrics Service at 0.0.0.0:8002

新打开一个终端,使用下边的指令验证以下
- $ curl -v localhost:8000/v2/health/ready
- * Trying 127.0.0.1...
- * TCP_NODELAY set
- * Connected to localhost (127.0.0.1) port 8000 (#0)
- > GET /v2/health/ready HTTP/1.1
- > Host: localhost:8000
- > User-Agent: curl/7.61.0
- > Accept: */*
- >
- < HTTP/1.1 200 OK
- < Content-Length: 0
- < Content-Type: text/plain
- <
- * Connection #0 to host localhost left intact
至此,服务端安装并启动完毕
客户端也有镜像示例代码,安装方法如下
下拉方法(注意把 <xx.yy> 替换为自己要下拉的版本),这里我以20.09-py3-clientsdk为例
- $ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk
- $ docker pull nvcr.io/nvidia/tritonserver:20.09-py3-clientsdk
启动方法
- $ docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk
- $ docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:20.09-py3-clientsdk
容器启动完毕后,可以使用下边的方法进行测试
二进制文件方法:
- $ /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
- Request 0, batch size 1
- Image 'images/mug.jpg':
- 0.723992 (504) = COFFEE MUG
python脚本方法
- $ python /workspace/install/python/image_client.py -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
- Request 1, batch size 1
- 0.777365 (504) = COFFEE MUG
至此完成了环境的部署工作,接下来进行进一步的研究
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。