赞
踩
安装适合的的CUDA:https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=11
科学上网,可以单独装Chat With RTX 先,模型之后手动装
参考官方:https://github.com/NVIDIA/TensorRT-LLM/blob/rel/windows/README.md
参考命令:pip install tensorrt_llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121
实例:env_nvd_rag\python.exe -m pip install tensorrt_llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121
例如Qwen,参考:https://github.com/NVIDIA/TensorRT-LLM/blob/rel/examples/qwen/README.md
先安装下对应的依赖
根据文档提示进行构建
参考:3.安装TensorRT-LLM
进行一个配置文件的修改
{ "name": "Qwen 1.8B Chat int4", "installed": true, "metadata": { "model_path": "model\\Qwen\\Qwen-1_8B-Chat\\trt_engines\\int4_weight_only\\1-gpu", "engine": "qwen_float16_tp1_rank0.engine", "tokenizer_path": "model\\Qwen\\Qwen-1_8B-Chat", "max_new_tokens": 1024, "max_input_token": 4096, "temperature": 0.1 } }, { "name": "Mistral 7B int4", "installed": false, "metadata": { "model_path": "model\\mistral\\mistral7b_int4_engine", "engine": "llama_float16_tp1_rank0.engine", "tokenizer_path": "model\\mistral\\mistral7b_hf", "max_new_tokens": 1024, "max_input_token": 7168, "temperature": 0.1 } },
使用官方提供的版本
修改参考 “F:\ChatWithRTX\RAG\trt-llm-rag-windows-main\ui\user_interface.py”
def _validate_request(self, request: gr.Request): headers = request.headers session_key = None if 'cookie' in headers: cookies = headers['cookie'] if '_s_chat_' in cookies: cookies = cookies.split('; ') for cookie in cookies: key, value = cookie.split('=', 1) # 在这里使用maxsplit参数 if key == '_s_chat_': session_key = value if session_key is None or session_key != self._secure_cookie: raise Exception('session validation failed') # 使用Exception代替字符串抛出错误 return True
webui正常启动了,但qwen无法使用,默认模型可以。。。(qwen模型问题继续研究)
问下GPT:
重装了cuda、cudnn,配置了环境变量,还是报错,重装了chat With RTX,顺带把模型勾上装了次,居然不报错了,有点东西。
自动打开浏览器访问webui
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。