当前位置:   article > 正文

本地化部署Fastgpt+One-API+ChatGLM3-6b知识库_fastgpt oneapi

fastgpt oneapi

1.第一步,本地模型搭建

.ChatGLM3-6b代码
https://github.com/THUDM/ChatGLM3

2.FastGPT
https://github.com/labring/FastGPT

3.向量模型m3e,可以通过docker部署
https://huggingface.co/moka-ai/m3e-base

先安装Anaconda3Index of /anaconda/archive/ | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror

先创建名字是chatglm3-demo的环境

  1. conda create -n chatglm3-demo python=3.10
  2. conda activate chatglm3-demo

chatglm3-6b需要Python 3.10以上。

cd /ChatGLM3-main/composite_demo

换到demo下测试模型,先安装requirements.txt,可以先把torch删除单独安装。

pip install -r requirements.txt

Code Interpreter 还需要安装 Jupyter 内核。新建系统变量名=IPYKERNEL。值=chatglm3-demo

ipython kernel install --name chatglm3-demo --user

安装pytorch和torchvision的对应版本

torch-2.0.0+cu117-cp310-cp310-win_amd64.whl
torchvision-0.15.0+cu117-cp310-cp310-win_amd64.whl

在下载包的位置安装

  1. pip install torch-2.0.0+cu117-cp310-cp310-win_amd64.whl
  2. pip install torchvision-0.15.0+cu117-cp310-cp310-win_amd64.whl

可在网站搜索对应版本https://download.pytorch.org/whl/

验证是否使用GPU学习

  1. import torch
  2. from transformers import __version__ as transformers_version
  3. import torchvision
  4. print("PyTorch VER:", torch.__version__)
  5. print("Transformers version:", transformers_version)
  6. print("TorchVision version:", torchvision.__version__)
  7. # 检查是否有可用的 GPU
  8. if torch.cuda.is_available():
  9. print("CUDA version:", torch.version.cuda)
  10. print("GPU TRUE")
  11. else:
  12. print("GPU FALSE")
  13. # 检查其他库的版本
  14. # 这里可以添加其他库的检查

打开composite_demo/client.py修改模型位置

运行命令测试,缺少什么模块就安装

streamlit run main.py

如果显存低于12G,回答响应太慢,改变量化模型后,可以正常对话。

正常对话,代码和环境都可以运行。

切换到D:\...\ChatGLM3-main\openai_api_demo

修改openai_api.py使用chatglm3模型位置

如果有其他模型,放在一个目录

上postman测试。请求体测试。

  1. {
  2. "model": "string",
  3. "messages": [
  4. {
  5. "role": "user",
  6. "content": "你好",
  7. "name": "string",
  8. "function_call": {
  9. "name": "string",
  10. "arguments": "string"
  11. }
  12. }
  13. ],
  14. "temperature": 0.8,
  15. "top_p": 0.8,
  16. "max_tokens": 0,
  17. "stream": false,
  18. "functions": {},
  19. "repetition_penalty": 1.1
  20. }

成功后运行

python openai_api.py

可替换openai_api.py代码

  1. # coding=utf-8
  2. # Implements API for ChatGLM3-6B in OpenAI's format. (https://platform.openai.com/docs/api-reference/chat)
  3. # Usage: python openai_api.py
  4. # Visit http://localhost:8000/docs for documents.
  5. # 在OpenAI的API中,max_tokens 等价于 HuggingFace 的 max_new_tokens 而不是 max_length,
  6. # 例如,对于6b模型,设置max_tokens = 8192,则会报错,因为扣除历史记录和提示词后,模型不能输出那么多的tokens。
  7. import os
  8. import time
  9. import json
  10. from contextlib import asynccontextmanager
  11. from typing import List, Literal, Optional, Union
  12. import torch
  13. from torch.cuda import get_device_properties
  14. import uvicorn
  15. from fastapi import FastAPI, HTTPException
  16. from fastapi.middleware.cors import CORSMiddleware
  17. from loguru import logger
  18. from pydantic import BaseModel, Field
  19. from sse_starlette.sse import EventSourceResponse
  20. from transformers import AutoTokenizer, AutoModel
  21. from utils import process_response, generate_chatglm3, generate_stream_chatglm3
  22. MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM_chatglm3-6b')
  23. TOKENIZER_PATH = os.environ.get("TOKENIZER_PATH", MODEL_PATH)
  24. DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
  25. @asynccontextmanager
  26. async def lifespan(app: FastAPI): # collects GPU memory
  27. yield
  28. if torch.cuda.is_available():
  29. torch.cuda.empty_cache()
  30. torch.cuda.ipc_collect()
  31. app = FastAPI(lifespan=lifespan)
  32. app.add_middleware(
  33. CORSMiddleware,
  34. allow_origins=["*"],
  35. allow_credentials=True,
  36. allow_methods=["*"],
  37. allow_headers=["*"],
  38. )
  39. class ModelCard(BaseModel):
  40. id: str
  41. object: str = "model"
  42. created: int = Field(default_factory=lambda: int(time.time()))
  43. owned_by: str = "owner"
  44. root: Optional[str] = None
  45. parent: Optional[str] = None
  46. permission: Optional[list] = None
  47. class ModelList(BaseModel):
  48. object: str = "list"
  49. data: List[ModelCard] = []
  50. class FunctionCallResponse(BaseModel):
  51. name: Optional[str] = None
  52. arguments: Optional[str] = None
  53. class ChatMessage(BaseModel):
  54. role: Literal["user", "assistant", "system", "function"]
  55. content: str = None
  56. name: Optional[str] = None
  57. function_call: Optional[FunctionCallResponse] = None
  58. class DeltaMessage(BaseModel):
  59. role: Optional[Literal["user", "assistant", "system"]] = None
  60. content: Optional[str] = None
  61. function_call: Optional[FunctionCallResponse] = None
  62. class ChatCompletionRequest(BaseModel):
  63. model: str
  64. messages: List[ChatMessage]
  65. temperature: Optional[float] = 0.8
  66. top_p: Optional[float] = 0.8
  67. max_tokens: Optional[int] = None
  68. stream: Optional[bool] = False
  69. functions: Optional[Union[dict, List[dict]]] = None
  70. # Additional parameters
  71. repetition_penalty: Optional[float] = 1.1
  72. class ChatCompletionResponseChoice(BaseModel):
  73. index: int
  74. message: ChatMessage
  75. finish_reason: Literal["stop", "length", "function_call"]
  76. class ChatCompletionResponseStreamChoice(BaseModel):
  77. index: int
  78. delta: DeltaMessage
  79. finish_reason: Optional[Literal["stop", "length", "function_call"]]
  80. class UsageInfo(BaseModel):
  81. prompt_tokens: int = 0
  82. total_tokens: int = 0
  83. completion_tokens: Optional[int] = 0
  84. class ChatCompletionResponse(BaseModel):
  85. model: str
  86. object: Literal["chat.completion", "chat.completion.chunk"]
  87. choices: List[Union[ChatCompletionResponseChoice, ChatCompletionResponseStreamChoice]]
  88. created: Optional[int] = Field(default_factory=lambda: int(time.time()))
  89. usage: Optional[UsageInfo] = None
  90. @app.get("/v1/models", response_model=ModelList)
  91. async def list_models():
  92. model_card = ModelCard(id="chatglm3-6b")
  93. return ModelList(data=[model_card])
  94. @app.post("/v1/chat/completions", response_model=ChatCompletionResponse)
  95. async def create_chat_completion(request: ChatCompletionRequest):
  96. global model, tokenizer
  97. if len(request.messages) < 1 or request.messages[-1].role == "assistant":
  98. raise HTTPException(status_code=400, detail="Invalid request")
  99. gen_params = dict(
  100. messages=request.messages,
  101. temperature=request.temperature,
  102. top_p=request.top_p,
  103. max_tokens=request.max_tokens or 1024,
  104. echo=False,
  105. stream=request.stream,
  106. repetition_penalty=request.repetition_penalty,
  107. functions=request.functions,
  108. )
  109. logger.debug(f"==== request ====\n{gen_params}")
  110. if request.stream:
  111. # Use the stream mode to read the first few characters, if it is not a function call, direct stram output
  112. predict_stream_generator = predict_stream(request.model, gen_params)
  113. output = next(predict_stream_generator)
  114. if not contains_custom_function(output):
  115. return EventSourceResponse(predict_stream_generator, media_type="text/event-stream")
  116. # Obtain the result directly at one time and determine whether tools needs to be called.
  117. logger.debug(f"First result output:\n{output}")
  118. function_call = None
  119. if output and request.functions:
  120. try:
  121. function_call = process_response(output, use_tool=True)
  122. except:
  123. logger.warning("Failed to parse tool call")
  124. # CallFunction
  125. if isinstance(function_call, dict):
  126. function_call = FunctionCallResponse(**function_call)
  127. """
  128. In this demo, we did not register any tools.
  129. You can use the tools that have been implemented in our `tool_using` and implement your own streaming tool implementation here.
  130. Similar to the following method:
  131. function_args = json.loads(function_call.arguments)
  132. tool_response = dispatch_tool(tool_name: str, tool_params: dict)
  133. """
  134. tool_response = ""
  135. if not gen_params.get("messages"):
  136. gen_params["messages"] = []
  137. gen_params["messages"].append(ChatMessage(
  138. role="assistant",
  139. content=output,
  140. ))
  141. gen_params["messages"].append(ChatMessage(
  142. role="function",
  143. name=function_call.name,
  144. content=tool_response,
  145. ))
  146. # Streaming output of results after function calls
  147. generate = predict(request.model, gen_params)
  148. return EventSourceResponse(generate, media_type="text/event-stream")
  149. else:
  150. # Handled to avoid exceptions in the above parsing function process.
  151. generate = parse_output_text(request.model, output)
  152. return EventSourceResponse(generate, media_type="text/event-stream")
  153. # Here is the handling of stream = False
  154. response = generate_chatglm3(model, tokenizer, gen_params)
  155. # Remove the first newline character
  156. if response["text"].startswith("\n"):
  157. response["text"] = response["text"][1:]
  158. response["text"] = response["text"].strip()
  159. usage = UsageInfo()
  160. function_call, finish_reason = None, "stop"
  161. if request.functions:
  162. try:
  163. function_call = process_response(response["text"], use_tool=True)
  164. except:
  165. logger.warning("Failed to parse tool call, maybe the response is not a tool call or have been answered.")
  166. if isinstance(function_call, dict):
  167. finish_reason = "function_call"
  168. function_call = FunctionCallResponse(**function_call)
  169. message = ChatMessage(
  170. role="assistant",
  171. content=response["text"],
  172. function_call=function_call if isinstance(function_call, FunctionCallResponse) else None,
  173. )
  174. logger.debug(f"==== message ====\n{message}")
  175. choice_data = ChatCompletionResponseChoice(
  176. index=0,
  177. message=message,
  178. finish_reason=finish_reason,
  179. )
  180. task_usage = UsageInfo.model_validate(response["usage"])
  181. for usage_key, usage_value in task_usage.model_dump().items():
  182. setattr(usage, usage_key, getattr(usage, usage_key) + usage_value)
  183. return ChatCompletionResponse(model=request.model, choices=[choice_data], object="chat.completion", usage=usage)
  184. async def predict(model_id: str, params: dict):
  185. global model, tokenizer
  186. choice_data = ChatCompletionResponseStreamChoice(
  187. index=0,
  188. delta=DeltaMessage(role="assistant"),
  189. finish_reason=None
  190. )
  191. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  192. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  193. previous_text = ""
  194. for new_response in generate_stream_chatglm3(model, tokenizer, params):
  195. decoded_unicode = new_response["text"]
  196. delta_text = decoded_unicode[len(previous_text):]
  197. previous_text = decoded_unicode
  198. finish_reason = new_response["finish_reason"]
  199. if len(delta_text) == 0 and finish_reason != "function_call":
  200. continue
  201. function_call = None
  202. if finish_reason == "function_call":
  203. try:
  204. function_call = process_response(decoded_unicode, use_tool=True)
  205. except:
  206. logger.warning(
  207. "Failed to parse tool call, maybe the response is not a tool call or have been answered.")
  208. if isinstance(function_call, dict):
  209. function_call = FunctionCallResponse(**function_call)
  210. delta = DeltaMessage(
  211. content=delta_text,
  212. role="assistant",
  213. function_call=function_call if isinstance(function_call, FunctionCallResponse) else None,
  214. )
  215. choice_data = ChatCompletionResponseStreamChoice(
  216. index=0,
  217. delta=delta,
  218. finish_reason=finish_reason
  219. )
  220. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  221. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  222. choice_data = ChatCompletionResponseStreamChoice(
  223. index=0,
  224. delta=DeltaMessage(),
  225. finish_reason="stop"
  226. )
  227. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  228. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  229. yield '[DONE]'
  230. def predict_stream(model_id, gen_params):
  231. """
  232. The function call is compatible with stream mode output.
  233. The first seven characters are determined.
  234. If not a function call, the stream output is directly generated.
  235. Otherwise, the complete character content of the function call is returned.
  236. :param model_id:
  237. :param gen_params:
  238. :return:
  239. """
  240. output = ""
  241. is_function_call = False
  242. has_send_first_chunk = False
  243. for new_response in generate_stream_chatglm3(model, tokenizer, gen_params):
  244. decoded_unicode = new_response["text"]
  245. delta_text = decoded_unicode[len(output):]
  246. output = decoded_unicode
  247. # When it is not a function call and the character length is> 7,
  248. # try to judge whether it is a function call according to the special function prefix
  249. if not is_function_call and len(output) > 7:
  250. # Determine whether a function is called
  251. is_function_call = contains_custom_function(output)
  252. if is_function_call:
  253. continue
  254. # Non-function call, direct stream output
  255. finish_reason = new_response["finish_reason"]
  256. # Send an empty string first to avoid truncation by subsequent next() operations.
  257. if not has_send_first_chunk:
  258. message = DeltaMessage(
  259. content="",
  260. role="assistant",
  261. function_call=None,
  262. )
  263. choice_data = ChatCompletionResponseStreamChoice(
  264. index=0,
  265. delta=message,
  266. finish_reason=finish_reason
  267. )
  268. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  269. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  270. send_msg = delta_text if has_send_first_chunk else output
  271. has_send_first_chunk = True
  272. message = DeltaMessage(
  273. content=send_msg,
  274. role="assistant",
  275. function_call=None,
  276. )
  277. choice_data = ChatCompletionResponseStreamChoice(
  278. index=0,
  279. delta=message,
  280. finish_reason=finish_reason
  281. )
  282. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  283. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  284. if is_function_call:
  285. yield output
  286. else:
  287. yield '[DONE]'
  288. async def parse_output_text(model_id: str, value: str):
  289. """
  290. Directly output the text content of value
  291. :param model_id:
  292. :param value:
  293. :return:
  294. """
  295. choice_data = ChatCompletionResponseStreamChoice(
  296. index=0,
  297. delta=DeltaMessage(role="assistant", content=value),
  298. finish_reason=None
  299. )
  300. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  301. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  302. choice_data = ChatCompletionResponseStreamChoice(
  303. index=0,
  304. delta=DeltaMessage(),
  305. finish_reason="stop"
  306. )
  307. chunk = ChatCompletionResponse(model=model_id, choices=[choice_data], object="chat.completion.chunk")
  308. yield "{}".format(chunk.model_dump_json(exclude_unset=True))
  309. yield '[DONE]'
  310. def contains_custom_function(value: str) -> bool:
  311. """
  312. Determine whether 'function_call' according to a special function prefix.
  313. For example, the functions defined in "tool_using/tool_register.py" are all "get_xxx" and start with "get_"
  314. [Note] This is not a rigorous judgment method, only for reference.
  315. :param value:
  316. :return:
  317. """
  318. return value and 'get_' in value
  319. if __name__ == "__main__":
  320. tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, trust_remote_code=True)
  321. model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True)
  322. if torch.cuda.is_available():
  323. total_vram_in_gb = get_device_properties(0).total_memory / 1073741824
  324. print(f'\033[32m显存大小: {total_vram_in_gb:.2f} GB\033[0m')
  325. with torch.cuda.device(f'cuda:{0}'):
  326. torch.cuda.empty_cache()
  327. torch.cuda.ipc_collect()
  328. if total_vram_in_gb > 13:
  329. model = model.half().cuda()
  330. print(f'\033[32m使用显卡fp16精度运行\033[0m')
  331. elif total_vram_in_gb > 10:
  332. model = model.half().quantize(8).cuda()
  333. print(f'\033[32m使用显卡int8量化运行\033[0m')
  334. elif total_vram_in_gb > 4.5:
  335. model = model.half().quantize(4).cuda()
  336. print(f'\033[32m使用显卡int4量化运行\033[0m')
  337. else:
  338. model = model.float()
  339. print('\033[32m使用cpu运行\033[0m')
  340. else:
  341. model = model.float()
  342. print('\033[32m使用cpu运行\033[0m')
  343. model = model.eval()
  344. #bilibili@十字鱼 https://space.bilibili.com/893892 感谢参考——秋葉aaaki、大江户战士
  345. uvicorn.run(app, host='0.0.0.0', port=8000, workers=1)

2.部署One-Api

用于调用各种模型的节点,技术文档建议docker部署,可以用ubuntu20.04,windows程序里开启虚拟化。这里用VirtualBox,开启VT-x/AMD-V,需要在BIOS开启虚拟化功能,有些主板在安全设置里。网络端口转发添加3000、13000等,看需要增加规则。

打开ubuntu20,更新software可能需要一些时间,安装Code,Terminator用于之后的操作。首先解决权限问题。docker及docker守护程序的检查会涉及到权限问题。可将用户名添加到docker组,建议使用管理员权限操作。

sudo usermod -aG docker 用户名

 打开code,新建terminal,拉取one-api的镜像,端口为13000

docker run --name one-api -d --restart always -p 13000:3000 -e TZ=Asia/Shanghai -v /home/ubuntu/data/one-api:/data justsong/one-api

进入localhost:13000,登录root,密码123456

chatglm3的Base URL:http//localhost:8000

继续添加m3e渠道,Base URL:http://localhost:6200

添加新令牌,提交。复制箭头下第一个到txt黏贴,例如:https://chat.oneapi.pro/#/?settings={"key":"sk-fAAfFClsyVXxvAgp57Ab758260124a958aF00a2d49CcB625","url":"http://localhost:3000"}

用docker部署m3e模型,默认用CPU运行:
docker run -d -p 6200:6008 --name=m3e-large-api registry.cn-hangzhou.aliyuncs.com/fastgpt_docker/m3e-large-api:latest
使用GPU运行:
docker run -d -p 6200:6008 --gpus all --name=m3e-large-api registry.cn-hangzhou.aliyuncs.com/fastgpt_docker/m3e-large-api:latest
原镜像:
docker run -d -p 6200:6008 --name=m3e-large-api stawky/m3e-large-api:latest

成功运行后测试,会反馈一组嵌入向量数据,说明成功部署

  1. curl --location --request POST 'http://localhost:6200/v1/embeddings' \
  2. --header 'Authorization: Bearer sk-aaabbbcccdddeeefffggghhhiiijjjkkk' \
  3. --header 'Content-Type: application/json' \
  4. --data-raw '{
  5. "model": "m3e",
  6. "input": ["laf是什么"]
  7. }'

3.部署FastGPT

FastGPT也是Linux部署,这里就用Ubuntu20,打开Code,新建Terminal

下载docker-compose文件:

  1. curl -O https://raw.githubusercontent.com/labring/FastGPT/main/files/deploy/fastgpt/docker-compose.yml

下载config文件:

  1. curl -O https://raw.githubusercontent.com/labring/FastGPT/main/files/deploy/fastgpt/docker-compose.yml

拉取镜像:docker-compose pull

在后台运行容器:docker-compose up -d

FastGPT 4.6.8后mango副本集需要手动初始化操作

  1. # 查看 mongo 容器是否正常运行
  2. docker ps
  3. # 进入容器
  4. docker exec -it mongo bash
  5. # 连接数据库
  6. mongo -u myname -p mypassword --authenticationDatabase admin
  7. # 初始化副本集。如果需要外网访问,mongo:27017 可以改成 ip:27017。但是需要同时修改 FastGPT 连接的参数(MONGODB_URI=mongodb://myname:mypassword@mongo:27017/fastgpt?authSource=admin => MONGODB_URI=mongodb://myname:mypassword@ip:27017/fastgpt?authSource=admin)
  8. rs.initiate({
  9. _id: "rs0",
  10. members: [
  11. { _id: 0, host: "mongo:27017" }
  12. ]
  13. })
  14. # 检查状态。如果提示 rs0 状态,则代表运行成功
  15. rs.status()

docker-compose文件修改OPENAI_BASE_URL:http://localhost:13000/v1

连接到One-API的端口,localhost改为本地地址

docker-compose文件修改CHAT_API_KEY:填入从OneAPI令牌复制的key

config文件修改,直接复制

  1. {
  2. "systemEnv": {
  3. "openapiPrefix": "fastgpt",
  4. "vectorMaxProcess": 15,
  5. "qaMaxProcess": 15,
  6. "pgHNSWEfSearch": 100
  7. },
  8. "llmModels": [
  9. {
  10. "model": "chatglm3",
  11. "name": "chatglm3",
  12. "maxContext": 4000,
  13. "maxResponse": 4000,
  14. "quoteMaxToken": 2000,
  15. "maxTemperature": 1,
  16. "vision": false,
  17. "defaultSystemChatPrompt": ""
  18. },
  19. {
  20. "model": "gpt-3.5-turbo-1106",
  21. "name": "gpt-3.5-turbo",
  22. "maxContext": 16000,
  23. "maxResponse": 4000,
  24. "quoteMaxToken": 13000,
  25. "maxTemperature": 1.2,
  26. "inputPrice": 0,
  27. "outputPrice": 0,
  28. "censor": false,
  29. "vision": false,
  30. "datasetProcess": false,
  31. "toolChoice": true,
  32. "functionCall": false,
  33. "customCQPrompt": "",
  34. "customExtractPrompt": "",
  35. "defaultSystemChatPrompt": "",
  36. "defaultConfig":{}
  37. },
  38. {
  39. "model": "gpt-3.5-turbo-16k",
  40. "name": "gpt-3.5-turbo-16k",
  41. "maxContext": 16000,
  42. "maxResponse": 16000,
  43. "quoteMaxToken": 13000,
  44. "maxTemperature": 1.2,
  45. "inputPrice": 0,
  46. "outputPrice": 0,
  47. "censor": false,
  48. "vision": false,
  49. "datasetProcess": true,
  50. "toolChoice": true,
  51. "functionCall": false,
  52. "customCQPrompt": "",
  53. "customExtractPrompt": "",
  54. "defaultSystemChatPrompt": "",
  55. "defaultConfig":{}
  56. },
  57. {
  58. "model": "gpt-4-0125-preview",
  59. "name": "gpt-4-turbo",
  60. "maxContext": 125000,
  61. "maxResponse": 4000,
  62. "quoteMaxToken": 100000,
  63. "maxTemperature": 1.2,
  64. "inputPrice": 0,
  65. "outputPrice": 0,
  66. "censor": false,
  67. "vision": false,
  68. "datasetProcess": false,
  69. "toolChoice": true,
  70. "functionCall": false,
  71. "customCQPrompt": "",
  72. "customExtractPrompt": "",
  73. "defaultSystemChatPrompt": "",
  74. "defaultConfig":{}
  75. },
  76. {
  77. "model": "gpt-4-vision-preview",
  78. "name": "gpt-4-vision",
  79. "maxContext": 128000,
  80. "maxResponse": 4000,
  81. "quoteMaxToken": 100000,
  82. "maxTemperature": 1.2,
  83. "inputPrice": 0,
  84. "outputPrice": 0,
  85. "censor": false,
  86. "vision": false,
  87. "datasetProcess": false,
  88. "toolChoice": true,
  89. "functionCall": false,
  90. "customCQPrompt": "",
  91. "customExtractPrompt": "",
  92. "defaultSystemChatPrompt": "",
  93. "defaultConfig":{}
  94. }
  95. ],
  96. "vectorModels": [
  97. {
  98. "model": "m3e",
  99. "name": "m3e",
  100. "price": 0.1,
  101. "defaultToken": 500,
  102. "maxToken": 1800
  103. },
  104. {
  105. "model": "text-embedding-ada-002",
  106. "name": "Embedding-2",
  107. "inputPrice": 0,
  108. "outputPrice": 0,
  109. "defaultToken": 700,
  110. "maxToken": 3000,
  111. "weight": 100,
  112. "defaultConfig":{}
  113. }
  114. ],
  115. "reRankModels": [],
  116. "audioSpeechModels": [
  117. {
  118. "model": "tts-1",
  119. "name": "OpenAI TTS1",
  120. "inputPrice": 0,
  121. "outputPrice": 0,
  122. "voices": [
  123. { "label": "Alloy", "value": "alloy", "bufferId": "openai-Alloy" },
  124. { "label": "Echo", "value": "echo", "bufferId": "openai-Echo" },
  125. { "label": "Fable", "value": "fable", "bufferId": "openai-Fable" },
  126. { "label": "Onyx", "value": "onyx", "bufferId": "openai-Onyx" },
  127. { "label": "Nova", "value": "nova", "bufferId": "openai-Nova" },
  128. { "label": "Shimmer", "value": "shimmer", "bufferId": "openai-Shimmer" }
  129. ]
  130. }
  131. ],
  132. "whisperModel": {
  133. "model": "whisper-1",
  134. "name": "Whisper1",
  135. "inputPrice": 0,
  136. "outputPrice": 0
  137. }
  138. }

如果所有步骤成功,访问localhost:3000端口,localhost记得改为本机地址,进入FastGPT页面。本地模型chatglm3的API需要开启。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/575567
推荐阅读
  

闽ICP备14008679号