当前位置:   article > 正文

使用fastchat实现大模型高并发对话_fastchat并发处理能力

fastchat并发处理能力

手动执行脚本

python3 -m fastchat.serve.controller
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory "44GiB" --port 31001 --worker http://localhost:31001 --load-8bit
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory "44GiB" --port 31002 --worker http://localhost:31002 --load-8bit
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory "44GiB" --port 31003 --worker http://localhost:31003 --load-8bit
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory "44GiB" --port 31004 --worker http://localhost:31004 --load-8bit
python3 -m fastchat.serve.gradio_web_server --concurrency-count=150
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

脚本一键启动

import subprocess
import multiprocessing
import time
import fastchat.serve.gradio_web_server

def execute_command(command):
    # Execute the command
    try:
        subprocess.run(command, shell=True, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Error executing command: {e}")

if __name__ == "__main__":
    scripts = [
        "python3 -m fastchat.serve.controller",
        "CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory 44GiB --port 31001 --worker http://localhost:31001 --load-8bit",
        "CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory 44GiB --port 31002 --worker http://localhost:31002 --load-8bit",
        "CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory 44GiB --port 31003 --worker http://localhost:31003 --load-8bit",
        "CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path /home/NLP/LLM/pretrained_model/LanguageModels/ChatGLM2_6B --limit-worker-concurrency 100 --max-gpu-memory 44GiB --port 31004 --worker http://localhost:31004 --load-8bit",
        "python3 -m fastchat.serve.gradio_web_server --concurrency-count=150"
    ]
    processes = []
    start_time = 10
    add_time = 5
    for script in scripts:
        # Create a new process for each command
        p = multiprocessing.Process(target=execute_command, args=(script,))
        processes.append(p)

    for p in processes:
        p.start()
        # Wait for 5 seconds before starting the next process
        time.sleep(start_time)
        start_time += add_time

    for p in processes:
        p.join()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Guff_9hys/article/detail/787403
推荐阅读
相关标签
  

闽ICP备14008679号