赞
踩
当您面对成吨的会议录音,着急写会议纪要而不得不愚公移山、人海战术?听的头晕眼花,听的漏洞百出,听的怀疑人生,那么你是否想到了自动听写服务?
想想也是,百度一看,好家伙,收费不菲啊!请看下图
亲密,能花钱解决的都不是事,刚刚看到听写服务,很贵的,大致1400大洋,还是打折完毕的,而且还是云服务形式的,那么对于某些会议,比如保密会议,需要离线的,那么完全办不到,该怎么办呢? 下面就有请我们的PaddleSpeech出场来解决问题。
【超简单】之基于PaddleSpeech搭建个人语音听写服务,顾名思义,是通过PaddleSpeech来搭建语音听写服务的,主要思路如下。
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用如下:
- pip install paddlespeech
- 复制代码
1.win必须安装 Microsoft C++ 生成工具 - Visual Studio visualstudio.microsoft.com/zh-hans/vis… 工具,原因是 安装非纯 Python 包或编译 Cython 或 Pyrex 文件。
2.参考: WindowsCompilers - Python Wiki wiki.python.org/moin/Window…
- !pip install paddlespeech >log.log
- 复制代码
- !wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- 复制代码
- --2022-07-27 00:31:57-- https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
- Resolving paddlespeech.bj.bcebos.com (paddlespeech.bj.bcebos.com)... 182.61.200.195, 182.61.200.229, 2409:8c04:1001:1002:0:ff:b001:368a
- Connecting to paddlespeech.bj.bcebos.com (paddlespeech.bj.bcebos.com)|182.61.200.195|:443... connected.
- HTTP request sent, awaiting response... 200 OK
- Length: 159942 (156K) [audio/wav]
- Saving to: ‘zh.wav’
-
- zh.wav 100%[===================>] 156.19K --.-KB/s in 0.03s
-
- 2022-07-27 00:31:57 (5.52 MB/s) - ‘zh.wav’ saved [159942/159942]
- 复制代码
- from paddlespeech.cli.asr.infer import ASRExecutor
-
- asr = ASRExecutor()
- result = asr(audio_file="zh.wav")
- 复制代码
- [2022-07-27 00:33:02,175] [ INFO] - checking the audio file format......
- 复制代码
- print(result)
- 复制代码
- 我认为跑步最重要的就是给我带来了身体健康
- 复制代码
- !paddlespeech asr --lang zh --input zh.wav
- 复制代码
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/decorators.py:68: DeprecationWarning: `formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
- regargs, varargs, varkwargs, defaults, formatvalue=lambda value: ""
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/lm/counter.py:15: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
- from collections import Sequence, defaultdict
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/lm/vocabulary.py:13: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
- from collections import Counter, Iterable
- [nltk_data] Downloading package averaged_perceptron_tagger to
- [nltk_data] /home/aistudio/nltk_data...
- [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
- [nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
- [nltk_data] Unzipping corpora/cmudict.zip.
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
- from collections import MutableMapping
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
- from collections import Iterable, Mapping
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
- from collections import Sized
- W0727 00:38:55.935500 2181 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
- W0727 00:38:55.940197 2181 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/h5py/__init__.py:36: DeprecationWarning: `np.typeDict` is a deprecated alias for `np.sctypeDict`.
- from ._conv import register_converters as _register_converters
- 我认为跑步最重要的就是给我带来了身体健康
- 复制代码

会自动下载一堆东西有点慢,可以不用这个。
- [nltk_data] Downloading package averaged_perceptron_tagger to
- [nltk_data] /home/aistudio/nltk_data...
- [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
- [nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
- 复制代码
如遇到以下错误
- [2022-07-26 21:13:28,589] [ INFO] - checking the audio file format......
- [2022-07-26 21:13:28,594] [ ERROR] - Please input audio file less then 50 seconds.
- 复制代码
报错很明显,提示一个是音频格式问题,一个是小于50s问题,如果遇到这个问题后面解决。
1.音频必须为wav格式
2.音频大小必须小于50s
音频格式为wav格式,这个可通过录音笔设置(一般默认),或python代码转换,或者格式工厂进行转换。
此处使用auditok库
- !pip install auditok
- 复制代码
- Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
- Collecting auditok
- Downloading https://pypi.tuna.tsinghua.edu.cn/packages/49/3a/8b5579063cfb7ae3e89d40d495f4eff6e9cdefa14096ec0654d6aac52617/auditok-0.2.0-py3-none-any.whl (1.5 MB)
- l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.5 MB ? eta -:--:--━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.5 MB 14.2 MB/s eta 0:00:01━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 1.3/1.5 MB 19.7 MB/s eta 0:00:01━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 15.4 MB/s eta 0:00:00
- [?25hInstalling collected packages: auditok
- Successfully installed auditok-0.2.0
-
- [notice] A new release of pip available: 22.1.2 -> 22.2
- [notice] To update, run: pip install --upgrade pip
- 复制代码
切分原因上面交代过,因为PaddleSpeech识别最长语音为50s,故需要切分,这里直接调用好了。
- from paddlespeech.cli.asr.infer import ASRExecutor
- import csv
- import moviepy.editor as mp
- import auditok
- import os
- import paddle
- from paddlespeech.cli import ASRExecutor, TextExecutor
- import soundfile
- import librosa
- import warnings
-
- warnings.filterwarnings('ignore')
- 复制代码
- # 引入auditok库
- import auditok
- # 输入类别为audio
- def qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55):
- audio_file = path
- audio, audio_sample_rate = soundfile.read(
- audio_file, dtype="int16", always_2d=True)
-
- audio_regions = auditok.split(
- audio_file,
- min_dur=mmin_dur, # minimum duration of a valid audio event in seconds
- max_dur=mmax_dur, # maximum duration of an event
- # maximum duration of tolerated continuous silence within an event
- max_silence=mmax_silence,
- energy_threshold=menergy_threshold # threshold of detection
- )
-
- for i, r in enumerate(audio_regions):
- # Regions returned by `split` have 'start' and 'end' metadata fields
- print(
- "Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
-
- epath = ''
- file_pre = str(epath.join(audio_file.split('.')[0].split('/')[-1]))
-
- mk = 'change'
- if (os.path.exists(mk) == False):
- os.mkdir(mk)
- if (os.path.exists(mk + '/' + ty) == False):
- os.mkdir(mk + '/' + ty)
- if (os.path.exists(mk + '/' + ty + '/' + file_pre) == False):
- os.mkdir(mk + '/' + ty + '/' + file_pre)
- num = i
- # 为了取前三位数字排序
- s = '000000' + str(num)
-
- file_save = mk + '/' + ty + '/' + file_pre + '/' + \
- s[-3:] + '-' + '{meta.start:.3f}-{meta.end:.3f}' + '.wav'
- filename = r.save(file_save)
- print("region saved as: {}".format(filename))
- return mk + '/' + ty + '/' + file_pre

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。