赞
踩
最近在极客时间学习《AI 大模型应用开发实战营》,自己一边跟着学一边开发了一个进阶版本的 OpenAI-Translator,在这里简单记录下开发过程和心得体会,供有兴趣的同学参考:
其中PDF文件的解析是通过开源代码库pdfplumber实现的,可以读取出输入的pdf文件的元数据,以及页面对象,然后在针对页面对象,调用相应方法,可以解析出的文本数据和表格数据。
而英文翻译成中文的过程主要通过调用大模型来实现,其中prompt的设计是重中之重,决定了翻译的效果。
Claude-API虽然官方没有正式公布,但是开源社区的贡献者最新提供了代码包,以实现通过api和Claude大模型对话
首先,需要查看并拷贝出来网页端访问的cookie
并依据cookie,初始化Client (Client为代码库claude_api.py中声明的类)
claude_api = Client(cookie)
然后根据prompt和conversation_id参数,就可以和Claude大模型进行对话
- prompt = "<Your prompt content>"
- conversation_id = "<conversation_id>" or claude_api.create_new_chat()['uuid']
- response = claude_api.send_message(prompt, conversation_id)
- print(response)
由此在项目代码中,我们新建了一个claude_model.py
- class ClaudeModel(Model):
- def __init__(self, cookie):
- self.cookie = cookie
- self.claude_api = Client(cookie)
- self.model = "claude"
-
- def make_request(self, sys_prompt, user_prompt, conversation_id):
- attempts = 0
- prompt = sys_prompt + user_prompt
- while attempts < 3:
- try:
- if self.model == 'claude':
- response = self.claude_api.send_message(prompt, conversation_id)
- # translation = response.func()
- return response, True
- except requests.exceptions.RequestException as e:
- raise Exception(f"请求异常:{e}")
- except requests.exceptions.Timeout as e:
- raise Exception(f"请求超时:{e}")
- except Exception as e:
- raise Exception(f"发生了未知错误:{e}")
- return "", False

并在main.py中支持对ClaudeModel的初始化和调用
- cookie = '<Your cookie>'
-
- # model = OpenAIModel(model=model_name, api_key=api_key) # 调用GPT模型
- model = ClaudeModel(cookie) # 调用Claude模型
在V1.0版本的基础上,丰富了prompt信息
- def make_text_prompt(self, text: str, target_language: str) -> str:
- return f"Your task is to provide a clear and concise translation that accurately conveys the meaning of the original text {text}, tailored to the intended {target_language} audience. \
- Please be sure to accurately translate any specific terms or jargon. "
-
- def make_table_prompt(self, table: str, target_language: str) -> str:
- return f"Your task is to provide a clear and concise translation that accurately conveys the meaning of the original text in the given table \n{table} into {target_language} \
- and output the translation result with the same tabular format, maintaining the rows, columns and spacings "
-
- def make_sys_prompt(self, role:str) -> str:
- return f"As a distinguished {role}, you specialize in accurately and appropriately translating texts into any language, meticulously considering all linguistic complexities and nuances."
在system message中可以根据不同的领域,传入不同的角色作为参数,默认设置为 multilingual scholar and expert translator,还根据需要指明领域,如history expert等
由于token上限等原因,采用一个实例来检验效果
- def launch_gradio():
-
- iface = gr.Interface(
- fn=translation,
- title="OpenAI-Translator v2.0(PDF 电子书翻译工具)",
- inputs=[
- gr.File(label="上传PDF文件"),
- gr.Textbox(label="源语言(默认:英文)", placeholder="English", value="English"),
- gr.Textbox(label="目标语言(默认:中文)", placeholder="Chinese", value="Chinese"),
- gr.Textbox(label="文本类型(默认:文学)", placeholder="Chinese", value="Literature")
- ],
- outputs=[
- gr.File(label="下载翻译文件")
- ],
- allow_flagging="never"
- )
-
- iface.launch(share=True, server_name="0.0.0.0")

我们从BBC news中选取了一段时间为一分钟的新闻稿用作测试,新闻文本如下:
在翻译的prompt中,提示模型在处理人名时,如果不确定译法,可以保留为源语言的形式,翻译效果,如下图所示
可见人名虽然保持了,但其他专有名词如abaya,在新闻网站的翻译中,被译为“长袍”,而模型将其翻译为音译的阿巴亚,可见还可以对其他专有名词的处理做更多考虑,以及用专业的新闻语料知识库做相应的微调
[2] https://flowgpt.com/?searchQuery=book+pdf+translation
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。