赞
踩
本系列主要目标初步完成一款智能音箱的基础功能,包括语音唤醒、语音识别(语音转文字)、处理用户请求(比如查天气等,主要通过rasa自己定义意图实现)、语音合成(文字转语音)功能。
语音识别、语音合成采用离线方式实现。
语音识别使用sherpa-onnx,可以实现离线中英文语音识别。
本文用到的一些安装包在snowboy那一篇的必要条件中已经完成了部分构建,在离线语音识别安装完成之后也会把相关代码写到snowboy项目中,语音唤醒之后调用语音识别翻译用户说话的内容。
语音唤醒文章地址:
snowboy 自定义唤醒词 实现语音唤醒【语音助手】_殷长庆的博客-CSDN博客
sherpa-onnx教程(强烈建议按官网的步骤安装):
Installation — sherpa 1.3 documentation
sherpa-onnx的预编译模型
Pre-trained models — sherpa 1.3 documentation
- cd /home/test
-
- git clone https://github.com/k2-fsa/sherpa-onnx
- cd sherpa-onnx
- mkdir build
- cd build
- cmake -DCMAKE_BUILD_TYPE=Release ..
- make -j6
安装完成之后会在bin目录下发现sherpa-onnx的可执行文件
我选择的是offline-paraformer版本的模型,因为他同时支持中英文的离线识别,这个离线识别是基于wav视频文件的,正好满足要求。
参考官网地址:
Paraformer models — sherpa 1.3 documentation
下面是操作步骤:
- cd /home/test/sherpa-onnx
-
- GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/sherpa-onnx-paraformer-zh-2023-03-28
- cd sherpa-onnx-paraformer-zh-2023-03-28
- git lfs pull --include "*.onnx"
检查是否下载成功,注意看模型文件的大小
- sherpa-onnx-paraformer-zh-2023-03-28$ ls -lh *.onnx
- -rw-r--r-- 1 kuangfangjun root 214M Apr 1 07:28 model.int8.onnx
- -rw-r--r-- 1 kuangfangjun root 824M Apr 1 07:28 model.onnx
可以看到两个模型文件,这俩模型本机测试感觉差距不是太大,我选择的是int8这个版本
测试以下语音识别效果
- cd /home/test/sherpa-onnx
-
- ./build/bin/sherpa-onnx-offline \
- --tokens=./sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt \
- --paraformer=./sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx \
- ./sherpa-onnx-paraformer-zh-2023-03-28/test_wavs/0.wav
出现相应的正确打印就代表语音识别准备工作完成了
首先在sherpa-onnx目录的python-api-examples下有python的api,我们需要的是offline-decode-files.py这个文件,其中main()方法用来离线识别一个wav文件。
接下来我们对该文件进行一点点的修改,主要是把模型的默认参数配置好,然后识别完成之后返回识别内容
把offline-decode-files.py文件更名为offlinedecode.py,或者是新建一个offlinedecode.py文件
- touch offlinedecode.py
-
- vim offlinedecode.py
编辑文件的内容
- #!/usr/bin/env python3
- #
- # Copyright (c) 2023 by manyeyes
-
- """
- This file demonstrates how to use sherpa-onnx Python API to transcribe
- file(s) with a non-streaming model.
- Please refer to
- https://k2-fsa.github.io/sherpa/onnx/index.html
- to install sherpa-onnx and to download the pre-trained models
- used in this file.
- """
- import time
- import wave
- from typing import List, Tuple
-
- import numpy as np
- import sherpa_onnx
-
-
- class Constants:
- encoder="" # or 如果用zipformer模型需要修改成zipformer的 encoder-epoch-12-avg-4.int8.onnx
- decoder="" # or 如果用zipformer模型需要修改成zipformer的decoder-epoch-12-avg-4.int8.onnx
- joiner="" # or 如果用zipformer模型需要修改成zipformer的joiner-epoch-12-avg-4.int8.onnx
- tokens="/home/test/sherpa-onnx/sherpa-onnx-paraformer-zh-2023-03-28/tokens.txt" # 如果用zipformer模型需要修改成zipformer的tokens.txt
- num_threads=1
- sample_rate=16000
- feature_dim=80
- decoding_method="greedy_search" # Or modified_ Beam_ Search, only used when the encoder is not empty
- contexts="" # 关键词微调,只在modified_ Beam_ Search模式下有用
- context_score=1.5
- debug=False
- modeling_unit="char"
- paraformer="/home/test/sherpa-onnx/sherpa-onnx-paraformer-zh-2023-03-28/model.int8.onnx" # 实际上使用的是该模型
-
- global args,contexts_list,recognizer
- args = Constants()
-
- def encode_contexts(args, contexts: List[str]) -> List[List[int]]:
- tokens = {}
- with open(args.tokens, "r", encoding="utf-8") as f:
- for line in f:
- toks = line.strip().split()
- tokens[toks[0]] = int(toks[1])
- return sherpa_onnx.encode_contexts(
- modeling_unit=args.modeling_unit, contexts=contexts, sp=None, tokens_table=tokens
- )
-
-
- def read_wave(wave_filename: str) -> Tuple[np.ndarray, int]:
- """
- Args:
- wave_filename:
- Path to a wave file. It should be single channel and each sample should
- be 16-bit. Its sample rate does not need to be 16kHz.
- Returns:
- Return a tuple containing:
- - A 1-D array of dtype np.float32 containing the samples, which are
- normalized to the range [-1, 1].
- - sample rate of the wave file
- """
-
- with wave.open(wave_filename) as f:
- assert f.getnchannels() == 1, f.getnchannels()
- assert f.getsampwidth() == 2, f.getsampwidth() # it is in bytes
- num_samples = f.getnframes()
- samples = f.readframes(num_samples)
- samples_int16 = np.frombuffer(samples, dtype=np.int16)
- samples_float32 = samples_int16.astype(np.float32)
-
- samples_float32 = samples_float32 / 32768
- return samples_float32, f.getframerate()
-
- # 初始化(因为用到的是paraformer,所以实际上初始化的是paraformer的识别)
- def init():
- global args
- global recognizer
- global contexts_list
- contexts_list=[]
- if args.encoder:
- contexts = [x.strip().upper() for x in args.contexts.split("/") if x.strip()]
- if contexts:
- print(f"Contexts list: {contexts}")
- contexts_list = encode_contexts(args, contexts)
-
- recognizer = sherpa_onnx.OfflineRecognizer.from_transducer(
- encoder=args.encoder,
- decoder=args.decoder,
- joiner=args.joiner,
- tokens=args.tokens,
- num_threads=args.num_threads,
- sample_rate=args.sample_rate,
- feature_dim=args.feature_dim,
- decoding_method=args.decoding_method,
- context_score=args.context_score,
- debug=args.debug,
- )
- elif args.paraformer:
- recognizer = sherpa_onnx.OfflineRecognizer.from_paraformer(
- paraformer=args.paraformer,
- tokens=args.tokens,
- num_threads=args.num_threads,
- sample_rate=args.sample_rate,
- feature_dim=args.feature_dim,
- decoding_method=args.decoding_method,
- debug=args.debug,
- )
-
- # 语音识别
- # *sound_files 要识别的音频路径
- # return 识别后的结果
- def asr(*sound_files):
- global args
- global recognizer
- global contexts_list
- start_time = time.time()
-
- streams = []
- total_duration = 0
- for wave_filename in sound_files:
- samples, sample_rate = read_wave(wave_filename)
- duration = len(samples) / sample_rate
- total_duration += duration
- if contexts_list:
- s = recognizer.create_stream(contexts_list=contexts_list)
- else:
- s = recognizer.create_stream()
- s.accept_waveform(sample_rate, samples)
-
- streams.append(s)
-
- recognizer.decode_streams(streams)
- results = [s.result.text for s in streams]
- end_time = time.time()
-
- for wave_filename, result in zip(sound_files, results):
- return f"{result}"
编辑完成保存,把文件移动到snowboy的Python3目录下
mv offlinedecode.py /home/test/snowboy/examples/Python3/
修改snowboy的demo.py文件
- cd /home/test/snowboy/examples/Python3/
-
- vim demo.py
主要修改为snowboy唤醒设备之后,开始录音,当结束录音时调用sherpa-onnx识别语音内容,把demo.py修改为以下内容
- import snowboydecoder
- import signal
- import os
- import offlinedecode
-
- interrupted = False
-
- def signal_handler(signal, frame):
- global interrupted
- interrupted = True
-
- def interrupt_callback():
- global interrupted
- return interrupted
-
-
- # 初始化语音识别
- offlinedecode.init()
-
- # 唤醒词模型文件
- model = '../../model/hotword.pmdl'
-
- # capture SIGINT signal, e.g., Ctrl+C
- signal.signal(signal.SIGINT, signal_handler)
-
- detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5)
- print('Listening... Press Ctrl+C to exit')
-
- # 录音之后的回调
- # fname 音频文件路径
- def audio_recorder_callback(fname):
- text = offlinedecode.asr(fname)
- # 打印识别内容
- print(text)
- # 删除录音文件
- if isinstance(fname, str) and os.path.exists(fname):
- if os.path.isfile(fname):
- os.remove(fname)
-
-
- # main loop
- detector.start(detected_callback=snowboydecoder.play_audio_file,
- audio_recorder_callback=audio_recorder_callback,
- interrupt_check=interrupt_callback,
- sleep_time=0.03)
-
- detector.terminate()
编辑完成保存,然后测试是否有识别成功
- cd /home/test/snowboy/examples/Python3/
-
- python demo.py
成功之后会打印识别内容,然后删除本地录音文件。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。