赞
踩
最近一直在看KALDI官网的资料,在看的同时加一些注解,方便自己的理解。
我的学习笔记基本上都是来自KALDI官方网址http://kaldi.sourceforge.net,并加上我的注解,特此说明。
About the Kaldi project
What is Kaldi?
Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. For more detailed history and list of contributors see History of the Kaldi project.
The name Kaldi
According to legend, Kaldi was the Ethiopian goatherder who discovered the coffee plant.
注:传说这个叫Kaldi的人第一个发现了咖啡,在他放羊的时候,发现羊吃了一种树之后,特别有精神,所以就发现了咖啡。
Kaldi's versus other toolkits
Kaldi is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend. Important features include:
Code-level integration with Finite State Transducers (FSTs)
注:有限状态机
We compile against the OpenFst toolkit (using it as a library).
Extensive linear algebra support
注:线性代数的支持
We include a matrix library that wraps standard BLAS and LAPACK routines.
注:BLAS,基础线性代数程序集。LAPACK,线性代数程序库。
Extensible design
As far as possible, we provide our algorithms in the most generic form possible. For instance, our decoders are templated on an object that provides a score indexed by a (frame, fst-input-symbol) tuple. This means the decoder could work from any suitable source of scores, such as a neural net.
Open license
The code is licensed under Apache 2.0, which is one of the least restrictive licenses available.
Complete recipes
Our goal is to make available complete recipes for building speech recognition systems, that work from widely available databases such as those provided by the Linguistic Data Consortium (LDC).
注:Linguistic Data Consortium (LDC),语言资源联盟
The goal of releasing complete recipes is an important aspect of Kaldi. Since the code is publicly available
under a license that permits modifications and re-release, we would like to encourage people to release
their code, along with their script directories, in a similar format to Kaldi's own example script.
We have tried to make Kaldi's documentation as complete as possible given time constraints, but in the
short term we cannot hope to generate documentation that is as thorough as HTK's. In particular there is a
lot of introductory material in the HTKBook, explaining statistical speech recognition for the uninitiated,
that will probably never appear in Kaldi's documentation. Much of Kaldi's documentation is written in such
a way that it will only be accessible to an expert. In the future we hope to make it somewhat more
accessible, bearing in mind that our intended audience is speech recognition researchers or researchers-in-
training. In general, Kaldi is not a speech recognition toolkit "for dummies." It will allow you to do many
kinds of operations that don't make sense.
The flavor of Kaldi
In this section we attempt to summarize some of the more generic qualities of the Kaldi toolkit. To some
extent this describes the goals of the current developers, as much as it descibes the current status of the 、
project. It is not meant to exclude contributions from researchers whose work has a different flavor.
We emphasize generic algorithms and universal recipes
By "generic algorithms" we mean things like linear transforms, rather than those that are specific to
speech in some way. But we don't intend to be too dogmatic about this, if more specific algorithms are
useful.
We would like recipes that can be run on any data-set, rather than those that have to be customized.
We prefer provably correct algorithms
The recipes have been designed in such a way that in principle they should never fail in a catastophic
way. There has been an effort to avoid recipes and algorithms that could possibly fail, even if they don't fail
in the "normal case" (one example: FST weight-pushing, which normally helps but can crash or make things
much worse in certain cases).
Kaldi code is thoroughly tested.
The goal is for all or nearly all the code to have corresponding test routines.
We try to keep the simple cases simple.
There is a danger when building a large speech toolkit that the code can become a forest of rarely used
alternatives. We are trying to avoid this by structuring the toolkit in the following way. Each command-line
program generally works for a limited set of cases (e.g. a decoder might just work for GMMs). Thus, when
you add a new type of model, you create a new command-line decoder (that calls the same underlying
templated code).
Kaldi code is easy to understand.
Even though the Kaldi toolkit as a whole may get very large, we aim for each individual part of it to be
understandable without too much effort. We will accept some code duplication if it improves the
understandability of individual pieces.
Kaldi code is easy to reuse and refactor.
We aim for the toolkit to as loosely coupled as possible. In general this means that any given header
should need to #include as few other header files as possible. The matrix library, in particular, only depends
on code in one other subdirectory so it can be used independently of almost all the rest of Kaldi.
Status of the project
Currently, we have code and scripts for most standard techniques, including all standard linear transforms, MMI, boosted MMI and MCE discriminative training, and also feature-space discriminative training (like fMPE, but based on boosted MMI). We have working recipes for Wall Street Journal and Resource Management, and also for Switchboard. The Switchboard recipe is not yet giving state-of-the-art results, due to vocabulary and language model issues– we don't use any external data sources for this.
Note: after an early phase in which we intended to use version numbers for major releases of Kaldi ("v1" and so on), we realized that these type of releases do not mesh well with the natural style of development, which is very continuous. Currently we maintain two major versions of Kaldi: the "trunk" version and the "stable" version. The "trunk" version is the one most people commit to, and contains the most up-to-date features but may also contain partially finished features. The "stable" version is mostly a subset of "trunk" that slightly lags in time, and has more thorough testing. We also maintain several "sandbox" versions that are for projects that are in earlier stages of development. All these versions are available from our subversion repository on Sourceforge; see Downloading and installing Kaldi for more details.
Referencing Kaldi in papers
Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc. ASRU. 2011: 1-4.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。