当前位置:   article > 正文

大模型系列:OpenAI使用技巧_Whisper纠正转录拼写错误:提示vs后处理_openai whisper 语音识别如何纠错

openai whisper 语音识别如何纠错

纠正转录拼写错误:提示 vs 后处理

我们正在解决提高转录精度的问题,特别是涉及公司名称和产品引用时。我们的解决方案涉及使用 Whisper 提示参数和 GPT-4 的后处理能力的双重策略。

纠正不准确性的两种方法是:

  • 我们直接将正确拼写的列表输入到 Whisper 的提示参数中,以指导初始转录。

  • 我们利用 GPT-4 在转录后修复拼写错误,再次使用相同的正确拼写列表在提示中。

这些策略旨在确保对不熟悉的专有名词进行精确转录。

设置

要开始,请执行以下操作:

  • 导入 OpenAI Python 库(如果您没有它,则需要使用 pip install openai安装它)
  • 下载音频文件示例
# 导入所需的库
import openai  # 用于调用OpenAI API
import urllib  # 用于下载示例音频文件

  • 1
  • 2
  • 3
  • 4
# 设置下载路径
ZyntriQix_remote_filepath = "https://cdn.openai.com/API/examples/data/ZyntriQix.wav"

# 设置本地保存位置
ZyntriQix_filepath = "data/ZyntriQix.wav"

# 使用urllib库的urlretrieve方法下载示例音频文件并保存到本地
urllib.request.urlretrieve(ZyntriQix_remote_filepath, ZyntriQix_filepath)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
('data/ZyntriQix.wav', <http.client.HTTPMessage at 0x10559a910>)
  • 1

用虚构的音频记录来设定我们的基准线

我们的参考点是一个独白,这个独白是由ChatGPT根据作者给出的提示生成的。然后作者将这个内容朗读出来。因此,作者既通过提示引导了ChatGPT的输出,又通过朗读将其变得生动起来。

我们虚构的公司ZyntriQix提供一系列技术产品,包括Digique Plus、CynapseFive、VortiQore V8、EchoNix Array、OrbitalLink Seven和DigiFractal Matrix。我们还领导了几个倡议,如PULSE、RAPT、B.R.I.C.K.、Q.U.A.R.T.Z.和F.L.I.N.T.。

# 定义一个包装函数,用于查看提示对转录结果的影响
def transcribe(prompt: str, audio_filepath) -> str:
    """给定一个提示,转录音频文件。"""
    # 使用OpenAI的音频转录API创建一个转录对象
    transcript = openai.audio.transcriptions.create(
        file=open(audio_filepath, "rb"),  # 打开音频文件
        model="whisper-1",  # 使用whisper-1模型进行转录
        prompt=prompt,  # 设置转录的提示
    )
    return transcript.text  # 返回转录的文本内容
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
# 定义一个函数transcribe,用于将音频文件转换为文本
# 参数prompt为转换时的提示语,此处为空
# 参数audio_filepath为音频文件的路径,此处为ZyntriQix_filepath
def transcribe(prompt="", audio_filepath=ZyntriQix_filepath):
  • 1
  • 2
  • 3
  • 4
"Have you heard of ZentricX? This tech giant boasts products like Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the Pulse framework, Wrapped system, they've developed a brick infrastructure court system, and launched the Flint initiative, all highlighting their commitment to relentless innovation. ZentricX, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"
  • 1

耳语将我们公司的名称、产品名称以及缩写错误地大写了。让我们将正确的名称作为一个列表传递给提示。

# 导入transcribe函数
# 该函数用于将音频文件转换为文本
# 参数prompt为需要转换的文本
# 参数audio_filepath为音频文件的路径
transcribe(
    prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
    audio_filepath=ZyntriQix_filepath,
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
"Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a B.R.I.C.K. infrastructure, Q.U.A.R.T. system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"
  • 1

当传递产品名称列表时,有些产品名称被正确转录,而其他一些仍然拼写错误。

# 导入transcribe函数
# 该函数用于将音频文件转换为文本
# prompt参数为需要添加到提示中的产品列表
# audio_filepath参数为音频文件的路径
transcribe(
    prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
    audio_filepath=ZyntriQix_filepath,
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
"Have you heard of ZentricX? This tech giant boasts products like DigiCube Plus, Synapse 5, VortiCore V8, EchoNix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a brick infrastructure court system and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZentricX in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"
  • 1

你可以使用GPT-4来纠正拼写错误

利用GPT-4在语音内容未知的情况下,我们可以轻松地处理产品名称列表。

使用GPT-4的后处理技术比仅依赖Whisper的提示参数更具可扩展性,后者的令牌限制为244。GPT-4允许我们处理更大的正确拼写列表,使其成为处理大量产品列表的更强大的方法。

然而,这种后处理技术并非没有限制。它受所选模型的上下文窗口的限制,当处理大量独特术语时可能会带来挑战。例如,拥有数千个SKU的公司可能会发现GPT-4的上下文窗口不足以满足其需求,他们可能需要探索替代解决方案。

有趣的是,GPT-4的后处理技术似乎比仅使用Whisper更可靠。这种利用产品列表的方法增强了我们结果的可靠性。然而,这种增加的可靠性是有代价的,因为使用这种方法可能会增加成本并导致更高的延迟。

# 定义一个包装函数,用于查看提示对转录的影响
def transcribe_with_spellcheck(system_message, audio_filepath):
    # 调用OpenAI API,使用gpt-4模型进行对话生成
    completion = openai.chat.completions.create(
        model="gpt-4", # 指定使用的模型
        temperature=0, # 控制生成文本的随机性,这里设置为0表示完全确定性
        messages=[
            {"role": "system", "content": system_message}, # 系统提示信息
            {
                "role": "user",
                "content": transcribe(prompt="", audio_filepath=audio_filepath), # 调用transcribe函数进行语音转录,作为用户输入
            },
        ],
    )
    # 返回生成的文本
    return completion.choices[0].message.content
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

现在,让我们将原始产品列表输入到GPT-4中并评估其性能。通过这样做,我们的目标是评估AI模型在没有先前知识的情况下正确拼写专有产品名称的能力,即使在转录中出现的确切术语不为人知。在我们的实验中,GPT-4成功地正确拼写了我们的产品名称,证实了它作为确保转录准确性的可靠工具的潜力。

# 定义一个字符串变量system_prompt,存储了一个公司助手的任务描述
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."

# 调用transcribe_with_spellcheck函数,传入system_prompt和音频文件路径ZyntriQix_filepath作为参数,返回纠正拼写错误后的文本
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)

# 打印纠正拼写错误后的文本
print(new_text)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?
  • 1

在这种情况下,我们提供了一个全面的产品列表,其中包括所有先前使用的拼写,以及额外的新名称。这种情况模拟了一个现实生活中的情况,我们有一个大量的SKU列表,并且不确定要出现在转录中的确切术语。将这个广泛的产品名称列表输入系统中,结果得到了正确的转录输出。

# 定义一个字符串变量,作为系统提示信息,告诉用户他们的任务是纠正文本中的拼写错误,并确保以下产品名称的拼写正确
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array,  OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."

# 调用transcribe_with_spellcheck函数,将系统提示信息作为参数传入,并指定音频文件路径为ZyntriQix_filepath
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)

# 打印纠正后的文本
print(new_text)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?
  • 1

我们正在使用GPT-4作为拼写检查器,使用之前在提示中使用的相同的正确拼写列表。

# 定义一个字符串变量,作为系统提示信息
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your first task is to list the words that are not spelled correctly according to the list provided to you and to tell me the number of misspelled words. Your next task is to insert those correct words in place of the misspelled ones. List: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array,  OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."

# 调用transcribe_with_spellcheck函数,将系统提示信息转录成文本,并进行拼写检查
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)

# 输出转录后的文本
print(new_text)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
The misspelled words are: ZentricX, Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, Orbital Link 7, Digifractal Matrix, Pulse, Wrapped, brick, Flint, and 30. The total number of misspelled words is 12.

The corrected paragraph is:

Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just MetaSync Thirty years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?
  • 1
  • 2
  • 3
  • 4
  • 5
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/837379
推荐阅读
相关标签
  

闽ICP备14008679号