dependencies.dox 6.22 KB
// doc/dependencies.dox


// Copyright 2009-2011 Microsoft Corporation
//           2013-2014 Johns Hopkins University (author: Daniel Povey)

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

/**

  \page dependencies Software required to install and run Kaldi

    \section dependencies_environment Ideal computing environment

    First we'll explain the ideal type of computing environment, and then we'll
    say what is the bare minimum you need to run Kaldi.  The ideal computing
    environment is a cluster of Linux machines (any major distribution) running
    Sun GridEngine (SGE), with access to shared directories via NFS or some
    similar network filesystem.  In the ideal case, some computers on the
    grid will have NVidia GPUs which you can use for neural net training,
    and you can reserve these on the queue by adding some extra option to qsub.
    See \ref queue for more information.

    Some time ago we started a separate project called <a
    href=https://sourceforge.net/projects/kluster/> Kluster </a> that shows you
    how to create such a cluster on Amazon's EC2; however, this is not
    very well maintained; MIT's <a
    href="http://star.mit.edu/cluster/">StarCluster </a> is a larger and
    better-supported project that provides the same functionality.  Most of the
    scripts should be suitable for a locally hosted cluster based on Debian or
    Red Hat; you can investigate <a
    href=http://www.rocksclusters.org/wordpress/>Rocks</a> which aims to help
    you set up a cluster like that.

   \section dependencies_minimum Bare minimum computing environment

  The bare minimum computing environment to run Kaldi is any Unix-like
  environment; and it's possible to run it on a single machine, although of
  course it will be slower, and you may have to reduce the number of jobs used
  in some of the example scripts to avoid exhausting your machine's memory.

   Kaldi is best tested on Debian and Red Hat Linux, but will run on any
   Linux distribution, or on Cygwin or Mac OsX.

   Kaldi's scripts have been written in such a way that if you replace SGE with
   a similar mechanism with different syntax (such as Tork), it should be
   relatively easy to get it to work; we also provide a "dumb" replacement that
   you can use when there is no queueing system (search for run.pl and ssh.pl in
   the scripts).

   In the past Kaldi has been compiled on Windows; however, the example scripts
   will not work there, and we are not very actively maintaining the Windows
   compatibility of the code or the Windows build scripts (we fix problems when
   we are told about them though).

   \section dependencies_packages Software packages required

   The following is a non-exhaustive list of some of the packages you need in
   order to install Kaldi.  The full list is not important since the installation
   scripts will tell you what you are missing.

    - Git: this is needed to download Kaldi and other software that it depends on.
    - wget is required for the installation of some non-Kaldi components described below
    - The example scripts require standard UNIX utilities such as bash,
      perl, awk, grep, and make.

   It can also be helpful if you have an ATLAS linear-algebra package installed
   on your system.  Most systems already have this (You can also search the
   packages in linux for installation by simple commands like "yum search atlas"
   or "apt-cache search libatlas"); the best approach is to ignore this
   requirement for now and see if you have problems when you install Kaldi.

    \section dependencies_installed Software packages installed by Kaldi

   The following tools and libraries come with installation scripts in the
   tools/ directory so you won't have to install them yourself (note: this is a
   non-exhaustive list).

    - OpenFst:  we compile against this and use it heavily.
    - IRSTLM: this a language modeling toolkit.  Some of the example scripts require it but
      it is not tightly integrated with Kaldi; we can convert any Arpa format
      language model to an FST.
      - The IRSTLM build process requires automake, aclocal, and libtoolize
        (the corresponding packages are automake and libtool).
      - Note: some of the example scripts now use SRILM; we make it easy to install
        that, although you still have to register online to download it.
    - SRILM: some of the example scripts use this.  It's generally a better
      and more complete language modeling toolkit than IRSTLM; the only drawback
      is the license, which is not free for commercial use.  You have to
      enter your name on the download page to download it, so the installation
      script requires some human interaction.
    - sph2pipe: this is for converting sph format files into other formats such
      as wav.  It's needed for the example scripts that use LDC data.
    - sclite: this is for scoring and is not necessary as we have our own, simple
      scoring program (compute-wer.cc).
    - ATLAS, the linear algebra library.  This is only needed for the headers; in
      typical setups we expect that ATLAS will be on your system.  However, if it not
      already on your system you can compile ATLAS as long as your machine does not
      have CPU throttling enabled.
    - CLAPACK, the linear algebra library (we download the headers).
      This is useful only on systems where you don't have ATLAS and are
      instead compiling with CLAPACK.
    - OpenBLAS: this is an alternative to ATLAS or CLAPACK.  The scripts don't
      use it by default but we provide installation scripts so you can install
      it if you want to compare it against ATLAS (it's more actively
      maintained than ATLAS).

*/