当前位置:   article > 正文

snowboy+新一代kaldi(k2-fsa)sherpa-onnx实现离线语音识别【语音助手】

sherpa-onnx

背景

本系列主要目标初步完成一款智能音箱的基础功能,包括语音唤醒、语音识别(语音转文字)、处理用户请求(比如查天气等,主要通过rasa自己定义意图实现)、语音合成(文字转语音)功能。

语音识别、语音合成采用离线方式实现。

语音识别使用sherpa-onnx,可以实现离线中英文语音识别。

本文用到的一些安装包在snowboy那一篇的必要条件中已经完成了部分构建,在离线语音识别安装完成之后也会把相关代码写到snowboy项目中,语音唤醒之后调用语音识别翻译用户说话的内容。

语音唤醒文章地址:

snowboy 自定义唤醒词 实现语音唤醒【语音助手】_殷长庆的博客-CSDN博客

参考文章

sherpa-onnx教程(强烈建议按官网的步骤安装):

Installation — sherpa 1.3 documentation

sherpa-onnx的预编译模型

Pre-trained models — sherpa 1.3 documentation

实践

下载安装sherpa-onnx

  1. cd /home/test
  2. git clone https://github.com/k2-fsa/sherpa-onnx
  3. cd sherpa-onnx
  4. mkdir build
  5. cd build
  6. cmake -DCMAKE_BUILD_TYPE=Release ..
  7. make -j6

安装完成之后会在bin目录下发现sherpa-onnx的可执行文件

下载预编译模型

我选择的是offline-paraformer版本的模型,因为他同时支持中英文的离线识别,这个离线识别是基于wav视频文件的,正好满足要求。

参考官网地址:

Paraformer models — sherpa 1.3 documentation

下面是操作步骤:

  1. cd /home/test/sherpa-onnx
  2. GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28
  3. cd sherpa-onnx-paraformer-zh-2023-03-28
  4. git lfs pull --include "*.onnx"

检查是否下载成功,注意看模型文件的大小

  1. sherpa-onnx-paraformer-zh-2023-03-28$ ls -lh *.onnx
  2. -rw-r--r-- 1 kuangfangjun root 214M Apr 1 07:28 model.int8.onnx
  3. -rw-r--r-- 1 kuangfangjun root 824M Apr 1 07:28 model.onnx

可以看到两个模型文件,这俩模型本机测试感觉差距不是太大,我选择的是int8这个版本

测试语音识别

测试以下语音识别效果

  1. cd /home/test/sherpa-onnx
  2. ./build/bin/sherpa-onnx-offline \
  3. --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
  4. --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx \
  5. ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav

出现相应的正确打印就代表语音识别准备工作完成了

集成到snowboy

首先在sherpa-onnx目录的python-api-examples下有python的api,我们需要的是offline-decode-files.py这个文件,其中main()方法用来离线识别一个wav文件。

接下来我们对该文件进行一点点的修改,主要是把模型的默认参数配置好,然后识别完成之后返回识别内容

offlinedecode.py

把offline-decode-files.py文件更名为offlinedecode.py,或者是新建一个offlinedecode.py文件

  1. touch offlinedecode.py
  2. vim offlinedecode.py

编辑文件的内容

  1. #!/usr/bin/env python3
  2. #
  3. # Copyright (c) 2023 by manyeyes
  4. """
  5. This file demonstrates how to use sherpa-onnx Python API to transcribe
  6. file(s) with a non-streaming model.
  7. Please refer to
  8. https://k2-fsa.github.io/sherpa/onnx/index.html
  9. to install sherpa-onnx and to download the pre-trained models
  10. used in this file.
  11. """
  12. import time
  13. import wave
  14. from typing import List, Tuple
  15. import numpy as np
  16. import sherpa_onnx
  17. class Constants:
  18. encoder="" # or 如果用zipformer模型需要修改成zipformer的 encoder-epoch-12-avg-4.int8.onnx
  19. decoder="" # or 如果用zipformer模型需要修改成zipformer的decoder-epoch-12-avg-4.int8.onnx
  20. joiner="" # or 如果用zipformer模型需要修改成zipformer的joiner-epoch-12-avg-4.int8.onnx
  21. tokens="/home/test/sherpa-onnx/sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt" # 如果用zipformer模型需要修改成zipformer的tokens.txt
  22. num_threads=1
  23. sample_rate=16000
  24. feature_dim=80
  25. decoding_method="greedy_search" # Or modified_ Beam_ Search, only used when the encoder is not empty
  26. contexts="" # 关键词微调,只在modified_ Beam_ Search模式下有用
  27. context_score=1.5
  28. debug=False
  29. modeling_unit="char"
  30. paraformer="/home/test/sherpa-onnx/sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx" # 实际上使用的是该模型
  31. global args,contexts_list,recognizer
  32. args = Constants()
  33. def encode_contexts(args, contexts: List[str]) -> List[List[int]]:
  34. tokens = {}
  35. with open(args.tokens, "r", encoding="utf-8") as f:
  36. for line in f:
  37. toks = line.strip().split()
  38. tokens[toks[0]] = int(toks[1])
  39. return sherpa_onnx.encode_contexts(
  40. modeling_unit=args.modeling_unit, contexts=contexts, sp=None, tokens_table=tokens
  41. )
  42. def read_wave(wave_filename: str) -> Tuple[np.ndarray, int]:
  43. """
  44. Args:
  45. wave_filename:
  46. Path to a wave file. It should be single channel and each sample should
  47. be 16-bit. Its sample rate does not need to be 16kHz.
  48. Returns:
  49. Return a tuple containing:
  50. - A 1-D array of dtype np.float32 containing the samples, which are
  51. normalized to the range [-1, 1].
  52. - sample rate of the wave file
  53. """
  54. with wave.open(wave_filename) as f:
  55. assert f.getnchannels() == 1, f.getnchannels()
  56. assert f.getsampwidth() == 2, f.getsampwidth() # it is in bytes
  57. num_samples = f.getnframes()
  58. samples = f.readframes(num_samples)
  59. samples_int16 = np.frombuffer(samples, dtype=np.int16)
  60. samples_float32 = samples_int16.astype(np.float32)
  61. samples_float32 = samples_float32 / 32768
  62. return samples_float32, f.getframerate()
  63. # 初始化(因为用到的是paraformer,所以实际上初始化的是paraformer的识别)
  64. def init():
  65. global args
  66. global recognizer
  67. global contexts_list
  68. contexts_list=[]
  69. if args.encoder:
  70. contexts = [x.strip().upper() for x in args.contexts.split("/") if x.strip()]
  71. if contexts:
  72. print(f"Contexts list: {contexts}")
  73. contexts_list = encode_contexts(args, contexts)
  74. recognizer = sherpa_onnx.OfflineRecognizer.from_transducer(
  75. encoder=args.encoder,
  76. decoder=args.decoder,
  77. joiner=args.joiner,
  78. tokens=args.tokens,
  79. num_threads=args.num_threads,
  80. sample_rate=args.sample_rate,
  81. feature_dim=args.feature_dim,
  82. decoding_method=args.decoding_method,
  83. context_score=args.context_score,
  84. debug=args.debug,
  85. )
  86. elif args.paraformer:
  87. recognizer = sherpa_onnx.OfflineRecognizer.from_paraformer(
  88. paraformer=args.paraformer,
  89. tokens=args.tokens,
  90. num_threads=args.num_threads,
  91. sample_rate=args.sample_rate,
  92. feature_dim=args.feature_dim,
  93. decoding_method=args.decoding_method,
  94. debug=args.debug,
  95. )
  96. # 语音识别
  97. # *sound_files 要识别的音频路径
  98. # return 识别后的结果
  99. def asr(*sound_files):
  100. global args
  101. global recognizer
  102. global contexts_list
  103. start_time = time.time()
  104. streams = []
  105. total_duration = 0
  106. for wave_filename in sound_files:
  107. samples, sample_rate = read_wave(wave_filename)
  108. duration = len(samples) / sample_rate
  109. total_duration += duration
  110. if contexts_list:
  111. s = recognizer.create_stream(contexts_list=contexts_list)
  112. else:
  113. s = recognizer.create_stream()
  114. s.accept_waveform(sample_rate, samples)
  115. streams.append(s)
  116. recognizer.decode_streams(streams)
  117. results = [s.result.text for s in streams]
  118. end_time = time.time()
  119. for wave_filename, result in zip(sound_files, results):
  120. return f"{result}"

编辑完成保存,把文件移动到snowboy的Python3目录下

mv offlinedecode.py /home/test/snowboy/examples/Python3/

demo.py

修改snowboy的demo.py文件

  1. cd /home/test/snowboy/examples/Python3/
  2. vim demo.py

主要修改为snowboy唤醒设备之后,开始录音,当结束录音时调用sherpa-onnx识别语音内容,把demo.py修改为以下内容

  1. import snowboydecoder
  2. import signal
  3. import os
  4. import offlinedecode
  5. interrupted = False
  6. def signal_handler(signal, frame):
  7. global interrupted
  8. interrupted = True
  9. def interrupt_callback():
  10. global interrupted
  11. return interrupted
  12. # 初始化语音识别
  13. offlinedecode.init()
  14. # 唤醒词模型文件
  15. model = '../../model/hotword.pmdl'
  16. # capture SIGINT signal, e.g., Ctrl+C
  17. signal.signal(signal.SIGINT, signal_handler)
  18. detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5)
  19. print('Listening... Press Ctrl+C to exit')
  20. # 录音之后的回调
  21. # fname 音频文件路径
  22. def audio_recorder_callback(fname):
  23. text = offlinedecode.asr(fname)
  24. # 打印识别内容
  25. print(text)
  26. # 删除录音文件
  27. if isinstance(fname, str) and os.path.exists(fname):
  28. if os.path.isfile(fname):
  29. os.remove(fname)
  30. # main loop
  31. detector.start(detected_callback=snowboydecoder.play_audio_file,
  32. audio_recorder_callback=audio_recorder_callback,
  33. interrupt_check=interrupt_callback,
  34. sleep_time=0.03)
  35. detector.terminate()

编辑完成保存,然后测试是否有识别成功

测试集成效果

  1. cd /home/test/snowboy/examples/Python3/
  2. python demo.py

成功之后会打印识别内容,然后删除本地录音文件。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/230766
推荐阅读
相关标签
  

闽ICP备14008679号