当前位置:   article > 正文

语音识别开源框架_开源语音识别框架

开源语音识别框架

语音识别开源框架

Whisper

特征

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Github地址

https://github.com/openai/whisper

开源文档介绍

https://openai.com/research/whisper

论文参考

https://arxiv.org/abs/2212.04356

ASRT

特征

ASRT是一套基于深度学习实现的语音识别系统,全称为Auto Speech Recognition Tool,由AI柠檬博主开发并在GitHub上开源(GPL 3.0协议)。本项目声学模型通过采用卷积神经网络(CNN)和连接性时序分类(CTC)方法,使用大量中文语音数据集进行训练,将声音转录为中文拼音,并通过语言模型,将拼音序列转换为中文文本。算法模型在测试集上已经获得了80%的正确率。基于该模型,在Windows平台上实现了一个基于ASRT的语音识别应用软件,取得了较好应用效果。这个应用软件包含Windows 10 UWP商店应用和Windows 版.Net平台桌面应用,也一起开源在GitHub上了。

环境

硬件

  • CPU: 4核 (x86_64, amd64) +
  • RAM: 16 GB +
  • GPU: NVIDIA, Graph Memory 11GB+ (1080ti起步)
  • 硬盘: 500 GB 机械硬盘(或固态硬盘)

软件

  • Linux: Ubuntu 18.04 + / CentOS 7 + 或 Windows 10/11
  • Python: 3.7 - 3.10 及后续版本
  • TensorFlow: 2.5 - 2.11 及后续版本

Github地址

https://github.com/nl8590687/ASRT_SpeechRecognition

开源文档介绍

https://wiki.ailemon.net/docs/asrt-doc/asrt-doc-1demhoid4inc6

DeepSpeech

特征

DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.

不基于phoneme,基于端到端的深度学习语音系统,一个使用多个gpu的优化的RNN训练系统

环境

基于TensorFlow

Github地址

https://github.com/mozilla/DeepSpeech

文档介绍

https://deepspeech.readthedocs.io/en/r0.9/?badge=latest

论文参考

https://arxiv.org/abs/1412.5567

DeepSpeech2

环境

基于PaddlePaddle

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

推荐阅读
相关标签