当前位置:   article > 正文

并发调用deepseek API,构建多轮对话数据_deepseek并发

deepseek并发

现在大模型领域非常魔幻一件事,是调用友商开源的大模型构建自己的数据集,大家相互调用, 数据同源导致同样的问题回答内容也差不多,也难怪大家会质疑某些大模型是套壳gpt了,看来只有能积累原始数据的公司才能最终活下来。

这里就演示下如何用多进程调用商用大模型构建sft数据集

# -*- coding: utf-8 -*-
# @Time    : 2024/6/19 上午10:33
# @Author  : yblir
# @File    : xiaohuangya_deepseek_json.py
# explain  : 
# =======================================================
import json
import re
import sys

from openai import OpenAI
from loguru import logger
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
# from queue import Queue
# from multiprocessing import Queue, cpu_count, shared_memory, Process

work_nums = 100
# img_queue = Queue(1000)

client = OpenAI(api_key="这里填入自己购买的key", base_url="https://api.deepseek.com/")

def find_all_index(str_, pattern):
    patt = re.compile(pattern, re.IGNORECASE)
    pos = [item.start() for item in patt.finditer(str_)]
    return pos


def run(question_, index):
    new_line_ = {}
    history_ = []
    msg1 = '结合TABLE信息,分析TABLE表结构,question的查询目标,查询条件和查询逻辑这4项内容。'
    messages = [
        {"role": "system", "content": "你是个数据库专家,帮忙解答用户提出的问题"},
        {"role": "user", "content": f"{question_}, {msg1}"},
    ]
    response = client.chat.completions.create(
            model="deepseek-coder",
            messages=messages
    )
    result = response.choices[0].message.content
    history_.append([f"{question_}, {msg1}", result])
    # right_sql = deepseek_sql_create(question)
    # print(result)

    msg2 = "根据你的流程分析,针对给出的TABLE,写出能表达当前question意思的sql语句"
    messages.append({'role': 'assistant', 'content': result})
    messages.append(
            {'role': 'user', 'content': msg2})
    response = client.chat.completions.create(
            model="deepseek-coder",
            messages=messages
    )
    result = response.choices[0].message.content
    history_.append([msg2, result])
    # print('--------------------------------------------')
    # print(result)
    msg3 = '分析下你写的sql语句是否正确,如果你认为有不合理的地方,指出并改正。最后,给出你认为正确的sql语句'
    messages.append({'role': 'assistant', 'content': result})
    messages.append({'role'   : 'user',
                     'content': msg3})
    response = client.chat.completions.create(
            model="deepseek-coder",
            messages=messages
    )
    result = response.choices[0].message.content
    # print('--------------------------------------------')
    # print(result)

    new_line_['instruction'] = msg3
    new_line_['input'] = ''
    new_line_['output'] = result
    new_line_['history'] = history_
    logger.info(f'已完成数量:{index}')
    return new_line_


if __name__ == '__main__':
    with open('tuning_sample.json', 'r', encoding='utf-8') as f:
        data = json.load(f)
    print(len(data))

    j = 6
    with ProcessPoolExecutor(max_workers=work_nums) as p:
        futures = [p.submit(run, line['instruction'], i + 1) for i, line in enumerate(data) if i + 1 > 1000]

        # 获取结果
        new_data_list = []
        k = 0
        for future in as_completed(futures):
            result = future.result()
            new_data_list.append(result)

            k += 1
            # 每抽取200条数据保存一次
            if k % 200 == 0:
                json_data = json.dumps(new_data_list, indent=4, ensure_ascii=False)
                with open(f'duck_sql_{j}.json', 'w', encoding='utf-8') as f:
                    f.write(json_data)
                j += 1
                new_data_list = []

        json_data = json.dumps(new_data_list, indent=4, ensure_ascii=False)
        with open(f'duck_sql_extra.json', 'w', encoding='utf-8') as f:
            f.write(json_data)

        logger.success('json 写入成功')

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/运维做开发/article/detail/878718
推荐阅读
相关标签
  

闽ICP备14008679号