  亲们,你们要写会议纪要嘛?
亲密,能花钱解决的都不是事,刚刚看到听写服务,很贵的,大致1400大洋,还是打折完毕的,而且还是云服务形式的,那么对于某些会议,比如保密会议,需要离线的,那么完全办不到,该怎么办呢? 下面就有请我们的PaddleSpeech出场来解决问题。



  • 1.录音长度切分
  • 2.录音听写
  • 3.录音文本加标点



PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用如下:

  • 语音识别
  • 语音翻译
  • 语音合成


  1. pip install paddlespeech
  • gcc >= 4.8.5
  • paddlepaddle >= 2.3.1
  • python >= 3.7
  • linux(推荐), mac, windows

2.2 win安装注意事项

  1. !pip install paddlespeech >log.log
2.3 快速试用

  1. !wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  1. --2022-07-27 00:31:57-- https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
  2. Resolving paddlespeech.bj.bcebos.com (paddlespeech.bj.bcebos.com)...,, 2409:8c04:1001:1002:0:ff:b001:368a
  3. Connecting to paddlespeech.bj.bcebos.com (paddlespeech.bj.bcebos.com)||:443... connected.
  4. HTTP request sent, awaiting response... 200 OK
  5. Length: 159942 (156K) [audio/wav]
  6. Saving to: ‘zh.wav’
  7. zh.wav 100%[===================>] 156.19K --.-KB/s in 0.03s
  8. 2022-07-27 00:31:57 (5.52 MB/s) - ‘zh.wav’ saved [159942/159942]
2.3.1 API调用

  1. from paddlespeech.cli.asr.infer import ASRExecutor
  2. asr = ASRExecutor()
  3. result = asr(audio_file="zh.wav")
  1. [2022-07-27 00:33:02,175] [ INFO] - checking the audio file format......
  1. print(result)
  1. 我认为跑步最重要的就是给我带来了身体健康
2.3.2 命令行调用

  1. !paddlespeech asr --lang zh --input zh.wav
  1. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/decorators.py:68: DeprecationWarning: `formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
  2. regargs, varargs, varkwargs, defaults, formatvalue=lambda value: ""
  3. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/lm/counter.py:15: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  4. from collections import Sequence, defaultdict
  5. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/nltk/lm/vocabulary.py:13: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  6. from collections import Counter, Iterable
  7. [nltk_data] Downloading package averaged_perceptron_tagger to
  8. [nltk_data] /home/aistudio/nltk_data...
  9. [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
  10. [nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
  11. [nltk_data] Unzipping corpora/cmudict.zip.
  12. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  13. from collections import MutableMapping
  14. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  15. from collections import Iterable, Mapping
  16. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  17. from collections import Sized
  18. W0727 00:38:55.935500 2181 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
  19. W0727 00:38:55.940197 2181 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
  20. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/h5py/__init__.py:36: DeprecationWarning: `np.typeDict` is a deprecated alias for `np.sctypeDict`.
  21. from ._conv import register_converters as _register_converters
  22. 我认为跑步最重要的就是给我带来了身体健康
  1. [nltk_data] Downloading package averaged_perceptron_tagger to
  2. [nltk_data] /home/aistudio/nltk_data...
  3. [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
  4. [nltk_data] Downloading package cmudict to /home/aistudio/nltk_data...
2.3.3 常见错误


  1. [2022-07-26 21:13:28,589] [ INFO] - checking the audio file format......
  2. [2022-07-26 21:13:28,594] [ ERROR] - Please input audio file less then 50 seconds.
  • 1.音频必须为wav格式

  • 2.音频大小必须小于50s




  1. !pip install auditok
  1. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  2. Collecting auditok
  3. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/49/3a/8b5579063cfb7ae3e89d40d495f4eff6e9cdefa14096ec0654d6aac52617/auditok-0.2.0-py3-none-any.whl (1.5 MB)
  4. l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.5 MB ? eta -:--:--━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.5 MB 14.2 MB/s eta 0:00:01━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 1.3/1.5 MB 19.7 MB/s eta 0:00:01━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 15.4 MB/s eta 0:00:00
  5. [?25hInstalling collected packages: auditok
  6. Successfully installed auditok-0.2.0
  7. [notice] A new release of pip available: 22.1.2 -> 22.2
  8. [notice] To update, run: pip install --upgrade pip
  1. from paddlespeech.cli.asr.infer import ASRExecutor
  2. import csv
  3. import moviepy.editor as mp
  4. import auditok
  5. import os
  6. import paddle
  7. from paddlespeech.cli import ASRExecutor, TextExecutor
  8. import soundfile
  9. import librosa
  10. import warnings
  11. warnings.filterwarnings('ignore')
  1. # 引入auditok库
  2. import auditok
  3. # 输入类别为audio
  4. def qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55):
  5. audio_file = path
  6. audio, audio_sample_rate = soundfile.read(
  7. audio_file, dtype="int16", always_2d=True)
  8. audio_regions = auditok.split(
  9. audio_file,
  10. min_dur=mmin_dur, # minimum duration of a valid audio event in seconds
  11. max_dur=mmax_dur, # maximum duration of an event
  12. # maximum duration of tolerated continuous silence within an event
  13. max_silence=mmax_silence,
  14. energy_threshold=menergy_threshold # threshold of detection
  15. )
  16. for i, r in enumerate(audio_regions):
  17. # Regions returned by `split` have 'start' and 'end' metadata fields
  18. print(
  19. "Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
  20. epath = ''
  21. file_pre = str(epath.join(audio_file.split('.')[0].split('/')[-1]))
  22. mk = 'change'
  23. if (os.path.exists(mk) == False):
  24. os.mkdir(mk)
  25. if (os.path.exists(mk + '/' + ty) == False):
  26. os.mkdir(mk + '/' + ty)
  27. if (os.path.exists(mk + '/' + ty + '/' + file_pre) == False):
  28. os.mkdir(mk + '/' + ty + '/' + file_pre)
  29. num = i
  30. # 为了取前三位数字排序
  31. s = '000000' + str(num)
  32. file_save = mk + '/' + ty + '/' + file_pre + '/' + \
  33. s[-3:] + '-' + '{meta.start:.3f}-{meta.end:.3f}' + '.wav'
  34. filename = r.save(file_save)
  35. print("region saved as: {}".format(filename))
  36. return mk + '/' + ty + '/' + file_pre


