当前位置:   article > 正文

基于PaddleSpeech搭建个人语音听写服务

paddlespeech

.需求分析

  • 亲们,你们要写会议纪要嘛?
  • 亲们,你们要写会议纪要嘛?
  • 亲们,你们要写会议纪要嘛?

当您面对成吨的会议录音,着急写会议纪要而不得不愚公移山、人海战术?听的头晕眼花,听的漏洞百出,听的怀疑人生,那么你是否想到了自动听写服务?

想想也是,百度一看,好家伙,收费不菲啊!请看下图

2.需求再分析

亲密,能花钱解决的都不是事,刚刚看到听写服务,很贵的,大致1400大洋,还是打折完毕的,而且还是云服务形式的,那么对于某些会议,比如保密会议,需要离线的,那么完全办不到,该怎么办呢? 下面就有请我们的PaddleSpeech出场来解决问题。

3.解决思路

【超简单】之基于PaddleSpeech搭建个人语音听写服务,顾名思义,是通过PaddleSpeech来搭建语音听写服务的,主要思路如下。

  • 1.录音长度切分
  • 2.录音听写
  • 3.录音文本加标点

二、环境搭建

1.PaddleSpeech简介

PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用如下:

  • 语音识别
  • 语音翻译
  • 语音合成

2.PaddleSpeech安装

  1. pip install paddlespeech
  2. 复制代码

2.1相关依赖

  • gcc >= 4.8.5
  • paddlepaddle >= 2.3.1
  • python >= 3.7
  • linux(推荐), mac, windows

2.2 win安装注意事项

  1. !pip install paddlespeech >log.log
  2. 复制代码

2.3 快速试用

  1. !wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  2. 复制代码
  1. --2022-07-27 00:31:57-- https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  2. Resolving paddlespeech.bj.bcebos.com (paddlespeech.bj.bcebos.com)... 182.61.200.195, 182.61.200.229, 2409:8c04:1001:1002:0:ff:b001:368a
  3. Connecting to paddlespeech.bj.bcebos.com (paddlespeech.bj.bcebos.com)|182.61.200.195|:443... connected.
  4. HTTP request sent, awaiting response... 200 OK
  5. Length: 159942 (156K) [audio/wav]
  6. Saving to: ‘zh.wav’
  7. zh.wav 100%[===================>] 156.19K --.-KB/s in 0.03s
  8. 2022-07-27 00:31:57 (5.52 MB/s) - ‘zh.wav’ saved [159942/159942]
  9. 复制代码

2.3.1 API调用

  1. from paddlespeech.cli.asr.infer import ASRExecutor
  2. asr = ASRExecutor()
  3. result = asr(audio_file="zh.wav")
  4. 复制代码
  1. [2022-07-27 00:33:02,175] [ INFO] - checking the audio file format......
  2. 复制代码
  1. print(result)
  2. 复制代码
  1. 我认为跑步最重要的就是给我带来了身体健康
  2. 复制代码

2.3.2 命令行调用

  1. !paddlespeech asr --lang zh --input zh.wav
  2. 复制代码
  1. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/decorators.py:68: DeprecationWarning: `formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
  2. regargs, varargs, varkwargs, defaults, formatvalue=lambda value: ""
  3. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/lm/counter.py:15: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  4. from collections import Sequence, defaultdict
  5. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/lm/vocabulary.py:13: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  6. from collections import Counter, Iterable
  7. [nltk_data] Downloading package averaged_perceptron_tagger to
  8. [nltk_data] /home/aistudio/nltk_data...
  9. [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
  10. [nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
  11. [nltk_data] Unzipping corpora/cmudict.zip.
  12. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  13. from collections import MutableMapping
  14. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  15. from collections import Iterable, Mapping
  16. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  17. from collections import Sized
  18. W0727 00:38:55.935500 2181 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
  19. W0727 00:38:55.940197 2181 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
  20. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/h5py/__init__.py:36: DeprecationWarning: `np.typeDict` is a deprecated alias for `np.sctypeDict`.
  21. from ._conv import register_converters as _register_converters
  22. 我认为跑步最重要的就是给我带来了身体健康
  23. 复制代码

会自动下载一堆东西有点慢,可以不用这个。

  1. [nltk_data] Downloading package averaged_perceptron_tagger to
  2. [nltk_data] /home/aistudio/nltk_data...
  3. [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
  4. [nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
  5. 复制代码

2.3.3 常见错误

如遇到以下错误

  1. [2022-07-26 21:13:28,589] [ INFO] - checking the audio file format......
  2. [2022-07-26 21:13:28,594] [ ERROR] - Please input audio file less then 50 seconds.
  3. 复制代码

报错很明显,提示一个是音频格式问题,一个是小于50s问题,如果遇到这个问题后面解决。

  • 1.音频必须为wav格式

  • 2.音频大小必须小于50s

音频格式为wav格式,这个可通过录音笔设置(一般默认),或python代码转换,或者格式工厂进行转换。

3.音频切分

此处使用auditok库

  1. !pip install auditok
  2. 复制代码
  1. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  2. Collecting auditok
  3. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/49/3a/8b5579063cfb7ae3e89d40d495f4eff6e9cdefa14096ec0654d6aac52617/auditok-0.2.0-py3-none-any.whl (1.5 MB)
  4. l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.5 MB ? eta -:--:--━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.5 MB 14.2 MB/s eta 0:00:01━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 1.3/1.5 MB 19.7 MB/s eta 0:00:01━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 15.4 MB/s eta 0:00:00
  5. [?25hInstalling collected packages: auditok
  6. Successfully installed auditok-0.2.0
  7. [notice] A new release of pip available: 22.1.2 -> 22.2
  8. [notice] To update, run: pip install --upgrade pip
  9. 复制代码

三、音频切分

切分原因上面交代过,因为PaddleSpeech识别最长语音为50s,故需要切分,这里直接调用好了。

  1. from paddlespeech.cli.asr.infer import ASRExecutor
  2. import csv
  3. import moviepy.editor as mp
  4. import auditok
  5. import os
  6. import paddle
  7. from paddlespeech.cli import ASRExecutor, TextExecutor
  8. import soundfile
  9. import librosa
  10. import warnings
  11. warnings.filterwarnings('ignore')
  12. 复制代码
  1. # 引入auditok库
  2. import auditok
  3. # 输入类别为audio
  4. def qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55):
  5. audio_file = path
  6. audio, audio_sample_rate = soundfile.read(
  7. audio_file, dtype="int16", always_2d=True)
  8. audio_regions = auditok.split(
  9. audio_file,
  10. min_dur=mmin_dur, # minimum duration of a valid audio event in seconds
  11. max_dur=mmax_dur, # maximum duration of an event
  12. # maximum duration of tolerated continuous silence within an event
  13. max_silence=mmax_silence,
  14. energy_threshold=menergy_threshold # threshold of detection
  15. )
  16. for i, r in enumerate(audio_regions):
  17. # Regions returned by `split` have 'start' and 'end' metadata fields
  18. print(
  19. "Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
  20. epath = ''
  21. file_pre = str(epath.join(audio_file.split('.')[0].split('/')[-1]))
  22. mk = 'change'
  23. if (os.path.exists(mk) == False):
  24. os.mkdir(mk)
  25. if (os.path.exists(mk + '/' + ty) == False):
  26. os.mkdir(mk + '/' + ty)
  27. if (os.path.exists(mk + '/' + ty + '/' + file_pre) == False):
  28. os.mkdir(mk + '/' + ty + '/' + file_pre)
  29. num = i
  30. # 为了取前三位数字排序
  31. s = '000000' + str(num)
  32. file_save = mk + '/' + ty + '/' + file_pre + '/' + \
  33. s[-3:] + '-' + '{meta.start:.3f}-{meta.end:.3f}' + '.wav'
  34. filename = r.save(file_save)
  35. print("region saved as: {}".format(filename))
  36. return mk + '/' + ty + '/' + file_pre


 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/344372?site
推荐阅读
相关标签
  

闽ICP备14008679号