赞
踩
A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
https://github.com/openai/whisper
https://openai.com/research/whisper
https://arxiv.org/abs/2212.04356
ASRT是一套基于深度学习实现的语音识别系统,全称为Auto Speech Recognition Tool,由AI柠檬博主开发并在GitHub上开源(GPL 3.0协议)。本项目声学模型通过采用卷积神经网络(CNN)和连接性时序分类(CTC)方法,使用大量中文语音数据集进行训练,将声音转录为中文拼音,并通过语言模型,将拼音序列转换为中文文本。算法模型在测试集上已经获得了80%的正确率。基于该模型,在Windows平台上实现了一个基于ASRT的语音识别应用软件,取得了较好应用效果。这个应用软件包含Windows 10 UWP商店应用和Windows 版.Net平台桌面应用,也一起开源在GitHub上了。
硬件
软件
https://github.com/nl8590687/ASRT_SpeechRecognition
https://wiki.ailemon.net/docs/asrt-doc/asrt-doc-1demhoid4inc6
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.
不基于phoneme,基于端到端的深度学习语音系统,一个使用多个gpu的优化的RNN训练系统
https://github.com/mozilla/DeepSpeech
https://deepspeech.readthedocs.io/en/r0.9/?badge=latest
https://arxiv.org/abs/1412.5567
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。