赞
踩
一、每次发出请求加载模型时,定义一个keep_alive变量,说明要存在多长时间。
- curl http://localhost:11434/api/generate -d '{
- "model": "llama2",
- "prompt": "Why is the sky blue?",
- "stream": false,
- "keep_alive": "24h"
- }'
二、又或者可以每280秒加载一次模型,因为模型每五分钟自动删除,由于加载模型只需1ms,所以可以选择这种方案:
-
- import requests
- import time
- from datetime import datetime
- import pytz
-
- def get_bj_time():
- beijing_tz = pytz.timezone('Asia/Shanghai')
- return datetime.now(beijing_tz).strftime("%Y-%m-%d %H:%M:%S")
-
- while True:
-
- data = {"model": "qwen:7b", "keep_alive": "5m"}
- headers = {'Content-Type': 'application/json'}
- high_precision_time = time.perf_counter()
- response = requests.post('http://localhost:11434/api/generate', json=data, headers=headers)
- high_precision_time_end = time.perf_counter()
- time1 = high_precision_time_end-high_precision_time
- print(f"高精度时间(精确到微秒): {time1*1000:.6f}")
- jsonResponse = response.content.decode('utf-8') # 将 bytes 转换为字符串以便打印
- print(jsonResponse)
- print(f"当前北京时间:{get_bj_time()}")
- time.sleep(280) # 暂停280秒后再次执行
-
- '''
- 7b初次加载模型时间:3.867187177s, 第二次加载模型时间:0.766666ms
- 14b初次加载模型时间:5.180146173s , 第二次加载模型时间:0.753414ms
- 72b初次加载模型时间:16.991763358s,第二次加载模型时间:1.358505ms
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。