语音识别开源框架_开源语音识别框架

作者：花生_TL007 | 2024-04-04 16:54:32

踩

开源语音识别框架

语音识别开源框架

文章目录

语音识别开源框架
- Whisper
ASRT
DeepSpeech
DeepSpeech2
ESPNET
kaldi
- sherpa-ncnn
Wenet
Speechbrain
Vosk API
fairseq（传统端到端）框架
Eesen
*Athena*
- - 特征&环境
  - Github地址
PIKA
- SpeechLM（暂时不能用）
- Alibaba-MIT-Speech
- - 特征
  - Github地址

Whisper

特征

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Github地址

https://github.com/openai/whisper

开源文档介绍

https://openai.com/research/whisper

论文参考

https://arxiv.org/abs/2212.04356

ASRT

特征

ASRT是一套基于深度学习实现的语音识别系统，全称为Auto Speech Recognition Tool，由AI柠檬博主开发并在GitHub上开源(GPL 3.0协议)。本项目声学模型通过采用卷积神经网络（CNN）和连接性时序分类（CTC）方法，使用大量中文语音数据集进行训练，将声音转录为中文拼音，并通过语言模型，将拼音序列转换为中文文本。算法模型在测试集上已经获得了80%的正确率。基于该模型，在Windows平台上实现了一个基于ASRT的语音识别应用软件，取得了较好应用效果。这个应用软件包含Windows 10 UWP商店应用和Windows 版.Net平台桌面应用，也一起开源在GitHub上了。

环境

硬件

CPU: 4核 (x86_64, amd64) +
RAM: 16 GB +
GPU: NVIDIA, Graph Memory 11GB+ (1080ti起步)
硬盘: 500 GB 机械硬盘(或固态硬盘)

软件

Linux: Ubuntu 18.04 + / CentOS 7 + 或 Windows 10/11
Python: 3.7 - 3.10 及后续版本
TensorFlow: 2.5 - 2.11 及后续版本

Github地址

https://github.com/nl8590687/ASRT_SpeechRecognition

开源文档介绍

https://wiki.ailemon.net/docs/asrt-doc/asrt-doc-1demhoid4inc6

DeepSpeech

特征

DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.

不基于phoneme，基于端到端的深度学习语音系统，一个使用多个gpu的优化的RNN训练系统

环境

基于TensorFlow

Github地址

https://github.com/mozilla/DeepSpeech

文档介绍

https://deepspeech.readthedocs.io/en/r0.9/?badge=latest

论文参考

https://arxiv.org/abs/1412.5567

DeepSpeech2

环境

基于PaddlePaddle

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/花生_TL007/article/detail/359972?site