OpenAI大模型实战：翻译应用深度探索与二次开发_开源大模型翻译功能

作者：我家自动化 | 2024-07-20 07:50:44

踩

开源大模型翻译功能

最近在极客时间学习《AI 大模型应用开发实战营》，自己一边跟着学一边开发了一个进阶版本的 OpenAI-Translator，在这里简单记录下开发过程和心得体会，供有兴趣的同学参考：

V1.0项目实现的功能

支持PDF文件格式解析
支持英文翻译成中文
支持OpenAI和ChatGLM模型
通过yaml文件或命令行参数进行配置

其中PDF文件的解析是通过开源代码库pdfplumber实现的，可以读取出输入的pdf文件的元数据，以及页面对象，然后在针对页面对象，调用相应方法，可以解析出的文本数据和表格数据。

而英文翻译成中文的过程主要通过调用大模型来实现，其中prompt的设计是重中之重，决定了翻译的效果。

V2.0 实现的功能

在大模型的调用方面，支持了Claude-API的调用

Claude-API虽然官方没有正式公布，但是开源社区的贡献者最新提供了代码包，以实现通过api和Claude大模型对话

首先，需要查看并拷贝出来网页端访问的cookie

并依据cookie，初始化Client (Client为代码库claude_api.py中声明的类)

claude_api = Client(cookie)
1

然后根据prompt和conversation_id参数，就可以和Claude大模型进行对话

prompt = "<Your prompt content>"
conversation_id = "<conversation_id>" or claude_api.create_new_chat()['uuid']
response = claude_api.send_message(prompt, conversation_id)
print(response)
1
2
3
4

由此在项目代码中，我们新建了一个claude_model.py

class ClaudeModel(Model):
    def __init__(self, cookie):
        self.cookie = cookie
        self.claude_api = Client(cookie)
        self.model = "claude"

    def make_request(self, sys_prompt, user_prompt, conversation_id):
        attempts = 0
        prompt = sys_prompt + user_prompt
        while attempts < 3:
            try:
                if self.model == 'claude':
                    response = self.claude_api.send_message(prompt, conversation_id)
                    # translation = response.func()
                return response, True
            except requests.exceptions.RequestException as e:
                raise Exception(f"请求异常：{e}")
            except requests.exceptions.Timeout as e:
                raise Exception(f"请求超时：{e}")
            except Exception as e:
                raise Exception(f"发生了未知错误：{e}")
        return "", False

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

并在main.py中支持对ClaudeModel的初始化和调用

cookie = '<Your cookie>'

# model = OpenAIModel(model=model_name, api_key=api_key) # 调用GPT模型
model = ClaudeModel(cookie)  # 调用Claude模型
1
2
3
4

prompt设计方面，加入了system message，并支持更据翻译内容的领域特性，灵活地赋予相应角色

在V1.0版本的基础上，丰富了prompt信息

def make_text_prompt(self, text: str, target_language: str) -> str:
        return f"Your task is to provide a clear and concise translation that accurately conveys the meaning of the original text {text}, tailored to the intended {target_language} audience. \
        Please be sure to accurately translate any specific terms or jargon. "
    
def make_table_prompt(self, table: str, target_language: str) -> str:
        return f"Your task is to provide a clear and concise translation that accurately conveys the meaning of the original text in the given table \n{table} into {target_language}  \
                and output the translation result with the same tabular format, maintaining the rows, columns and spacings "

def make_sys_prompt(self, role:str) -> str:
        return f"As a distinguished {role}, you specialize in accurately and appropriately translating texts into any language, meticulously considering all linguistic complexities and nuances."

1
2
3
4
5
6
7
8
9
10
11

在system message中可以根据不同的领域，传入不同的角色作为参数，默认设置为 multilingual scholar and expert translator，还根据需要指明领域，如history expert等

由于token上限等原因，采用一个实例来检验效果

利用gradio构建可视化界面

def launch_gradio():

    iface = gr.Interface(
        fn=translation,
        title="OpenAI-Translator v2.0(PDF 电子书翻译工具)",
        inputs=[
            gr.File(label="上传PDF文件"),
            gr.Textbox(label="源语言（默认：英文）", placeholder="English", value="English"),
            gr.Textbox(label="目标语言（默认：中文）", placeholder="Chinese", value="Chinese"),
            gr.Textbox(label="文本类型（默认：文学）", placeholder="Chinese", value="Literature")
        ],
        outputs=[
            gr.File(label="下载翻译文件")
        ],
        allow_flagging="never"
    )

    iface.launch(share=True, server_name="0.0.0.0")

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

效果如图所示

我们从BBC news中选取了一段时间为一分钟的新闻稿用作测试，新闻文本如下：

在翻译的prompt中，提示模型在处理人名时，如果不确定译法，可以保留为源语言的形式，翻译效果，如下图所示

可见人名虽然保持了，但其他专有名词如abaya，在新闻网站的翻译中，被译为“长袍”，而模型将其翻译为音译的阿巴亚，可见还可以对其他专有名词的处理做更多考虑，以及用专业的新闻语料知识库做相应的微调

参考资料

[1]
GitHub - KoushikNavuluri/Claude-API: This project provides an unofficial API for Claude AI, allowing users to access and interact with Claude AI .

[2]
https://flowgpt.com/?searchQuery=book+pdf+translation

[3]
AI Prompt Generator | AI Mind

[4]
https://github.com/gradio-app/gradio

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/我家自动化/article/detail/855840?site

OpenAI大模型实战：翻译应用深度探索与二次开发_开源大模型翻译功能

V1.0项目实现的功能

V2.0 实现的功能

利用gradio构建可视化界面

效果如图所示

参考资料