赞
踩
你的点赞和收藏是我持续分享优质内容的动力哦~
source /etc/network_turbo # 仅限autodl平台
pip config set global.index-url https://mirrors.pku.edu.cn/pypi/web/simple
pip install "xinference[vllm]"
几乎所有的最新模型
,这是 Pytorch 模型默认使用的引擎:pip install "xinference[transformers]"
GGML
引擎时,建议根据当前使用的硬件手动安装依赖,从而获得最佳的加速效果。pip install xinference ctransformers
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
pip install accelerate
pip install bitsandbytes
conda install nodejs
conda clean -i
# or $HOME一般是root, 即/root/.condarc
rm $HOME/.condarc
git clone https://github.com/xorbitsai/inference.git
cd inference
pip install -e .
xinference-local
npm cache clean --force
npm install
npm run build
# -> 您可以返回到包含 setup.cfg 和 setup.py 文件的目录
pip install -e .
XINFERENCE_HOME=./models/ xinference-local --host 0.0.0.0 --port 9997
HF_ENDPOINT=https://hf-mirror.com XINFERENCE_MODEL_SRC=modelscope xinference launch --model-name gemma-it --size-in-billions 2 --model-format pytorch --quantization 8-bit
from xinference.client import Client
client = Client("http://0.0.0.0:9997")
print(client.list_models())
import openai # Assume that the model is already launched. # The api_key can't be empty, any string is OK. model_uid = 'gemma-it' client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1") client.chat.completions.create( model=model_uid, messages=[ { "content": "What is the largest animal?", "role": "user", } ], max_tokens=1024 )
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。