赞
踩
通过 TensorRT-LLM 技术实战营 活动,了解到 Nvidia 的大语言模型推理加速技术开源框架 TensorRT-LLM(使用 C++ 实现,提供了 python API 包 tensorrt_llm)。看完视频课程《NVIDIA LLM 全栈式方案使用和优化最佳实践》开始动手搭环境,跑通 summarize.py 模型测试,在这里记录一下主要流程和遇到的问题。
从 NVIDIA Container Toolkit 架构看,底层依赖 Nvidia 显卡、Docker 环境,应用层依赖 Cuda Toolkit。
$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.1-1_amd64.deb
$ sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.1-1_amd64.deb
$ sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
$ sudo apt-get update
$ sudo apt-get -y install cuda-toolkit-12-4
docker run --runtime=nvidia --gpus all --entrypoint /bin/bash \
--name TensorRT-LLM \
-itd nvidia/cuda:12.1.0-devel-ubuntu22.04
$ apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
# 安装 git
$ apt-get install git-lfs
$ git lfs install
# 下载 TensorRT-LLM 工程
$ git clone https://github.com/NVIDIA/TensorRT-LLM.git
# 安装 python 依赖
$ cd TensorRT-LLM/
$ pip install -r examples/bloom/requirements.txt
# 模型放到 TensorRT-LLM/examples/bloom/560M
$ cd TensorRT-LLM/examples/bloom
$ mkdir -p bloom/560M
$ cd bloom
# 从 huggingface下载模型(可能失败)
$ git clone https://huggingface.co/bigscience/bloom-560m 560M
# 如果从 huggingface下载失败,改从 gitee 镜像下载
$ git clone https://gitee.com/modelee/bloom-560m.git 560M
$ cd TensorRT-LLM/examples/bloom
# Single GPU on BLOOM 560M
$ python convert_checkpoint.py --model_dir ./bloom/560M/ \
--dtype float16 \
--output_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/
# 打开 CUDA lazy loading
$ export CUDA_MODULE_LOADING=LAZY
# May need to add trtllm-build to PATH, export PATH=/usr/local/bin:$PATH
$ trtllm-build --checkpoint_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/ \
--gemm_plugin float16 \
--output_dir ./bloom/560M/trt_engines/fp16/1-gpu/
# 使用 Huggingface 镜像站,否则会报 ConnectionError: Couldn't reach 'ccdv/cnn_dailymail' on the Hub (SSLError)
$ export HF_ENDPOINT=https://hf-mirror.com
# 使用 github 镜像站,否则会报 ConnectionError: Couldn't reach https://raw.githubusercontent.com/abisee/cnn-dailymail/master/url_lists/all_test.txt
$ sed -i 's/githubusercontent/gitmirror/g' /root/.cache/huggingface/modules/datasets_modules/datasets/ccdv--cnn_dailymail/*/cnn_dailymail.py
# 运行大模型对输入内容作总结
$ python ../summarize.py --test_trt_llm \
--hf_model_dir ./bloom/560M/ \
--data_type fp16 \
--engine_dir ./bloom/560M/trt_engines/fp16/1-gpu/
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。