// doc/dependencies.dox
// Copyright 2009-2011 Microsoft Corporation
// 2013-2014 Johns Hopkins University (author: Daniel Povey)
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
/**
\page dependencies Software required to install and run Kaldi
\section dependencies_environment Ideal computing environment
First we'll explain the ideal type of computing environment, and then we'll
say what is the bare minimum you need to run Kaldi. The ideal computing
environment is a cluster of Linux machines (any major distribution) running
Sun GridEngine (SGE), with access to shared directories via NFS or some
similar network filesystem. In the ideal case, some computers on the
grid will have NVidia GPUs which you can use for neural net training,
and you can reserve these on the queue by adding some extra option to qsub.
See \ref queue for more information.
Some time ago we started a separate project called Kluster that shows you
how to create such a cluster on Amazon's EC2; however, this is not
very well maintained; MIT's StarCluster is a larger and
better-supported project that provides the same functionality. Most of the
scripts should be suitable for a locally hosted cluster based on Debian or
Red Hat; you can investigate Rocks which aims to help
you set up a cluster like that.
\section dependencies_minimum Bare minimum computing environment
The bare minimum computing environment to run Kaldi is any Unix-like
environment; and it's possible to run it on a single machine, although of
course it will be slower, and you may have to reduce the number of jobs used
in some of the example scripts to avoid exhausting your machine's memory.
Kaldi is best tested on Debian and Red Hat Linux, but will run on any
Linux distribution, or on Cygwin or Mac OsX.
Kaldi's scripts have been written in such a way that if you replace SGE with
a similar mechanism with different syntax (such as Tork), it should be
relatively easy to get it to work; we also provide a "dumb" replacement that
you can use when there is no queueing system (search for run.pl and ssh.pl in
the scripts).
In the past Kaldi has been compiled on Windows; however, the example scripts
will not work there, and we are not very actively maintaining the Windows
compatibility of the code or the Windows build scripts (we fix problems when
we are told about them though).
\section dependencies_packages Software packages required
The following is a non-exhaustive list of some of the packages you need in
order to install Kaldi. The full list is not important since the installation
scripts will tell you what you are missing.
- Git: this is needed to download Kaldi and other software that it depends on.
- wget is required for the installation of some non-Kaldi components described below
- The example scripts require standard UNIX utilities such as bash,
perl, awk, grep, and make.
It can also be helpful if you have an ATLAS linear-algebra package installed
on your system. Most systems already have this (You can also search the
packages in linux for installation by simple commands like "yum search atlas"
or "apt-cache search libatlas"); the best approach is to ignore this
requirement for now and see if you have problems when you install Kaldi.
\section dependencies_installed Software packages installed by Kaldi
The following tools and libraries come with installation scripts in the
tools/ directory so you won't have to install them yourself (note: this is a
non-exhaustive list).
- OpenFst: we compile against this and use it heavily.
- IRSTLM: this a language modeling toolkit. Some of the example scripts require it but
it is not tightly integrated with Kaldi; we can convert any Arpa format
language model to an FST.
- The IRSTLM build process requires automake, aclocal, and libtoolize
(the corresponding packages are automake and libtool).
- Note: some of the example scripts now use SRILM; we make it easy to install
that, although you still have to register online to download it.
- SRILM: some of the example scripts use this. It's generally a better
and more complete language modeling toolkit than IRSTLM; the only drawback
is the license, which is not free for commercial use. You have to
enter your name on the download page to download it, so the installation
script requires some human interaction.
- sph2pipe: this is for converting sph format files into other formats such
as wav. It's needed for the example scripts that use LDC data.
- sclite: this is for scoring and is not necessary as we have our own, simple
scoring program (compute-wer.cc).
- ATLAS, the linear algebra library. This is only needed for the headers; in
typical setups we expect that ATLAS will be on your system. However, if it not
already on your system you can compile ATLAS as long as your machine does not
have CPU throttling enabled.
- CLAPACK, the linear algebra library (we download the headers).
This is useful only on systems where you don't have ATLAS and are
instead compiling with CLAPACK.
- OpenBLAS: this is an alternative to ATLAS or CLAPACK. The scripts don't
use it by default but we provide installation scripts so you can install
it if you want to compare it against ATLAS (it's more actively
maintained than ATLAS).
*/