dnn.dox 4.35 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84


// doc/dnn.dox

// Copyright 2013-2015  Johns Hopkins University (author: Daniel Povey)

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

namespace kaldi {

/**

  \page dnn Deep Neural Networks in Kaldi


  \section dnn_intro Introduction

  Deep Neural Networks (DNNs) are the latest hot topic in speech recognition.
  Since around 2010 many papers have been published in this area, and some of
  the largest companies (e.g. Google, Microsoft) are starting to use DNNs in
  their production systems.  An active area of research like this is difficult
  for a toolkit like Kaldi to support well, because the state of the art changes
  constantly which means code changes are required to keep up, and architectural
  decisions may need to be rethought.

  We currently have three separate codebases for deep neural nets in Kaldi.  All
  are still active in the sense that the up-to-date recipes refer to all of
  them.  The first one ("nnet1"( is located in code subdirectories nnet/ and
  nnetbin/, and is primarily maintained by Karel Vesely.  The second is located
  in code subdirectories nnet2/ and nnet2bin/, and is primarily maintained by
  Daniel Povey (this code was originally based on an earlier version of Karel's
  code, but it has been extensively rewritten).  The third is located
  in code subdirectories nnet3/ and nnet3bin/, and Dan's previous work on \ref dnn2 "nnet2"
  will shift to the \ref dnn3 "nnet3" setup.
  
  In the example directories such as egs/wsj/s5/, egs/rm/s5, egs/swbd/s5 and egs/hkust/s5b, neural
  net example scripts can be found.  Karel's example scripts can be found in
  local/nnet/run_dnn.sh, and Dan's example scripts can be found in local/run_nnet2.sh.
  Before running those scripts, the first stages of ``run.sh'' in those directories must
  be run in order to build the systems used for alignment.

  Regarding which of the setups you should use:
  - Karel's setup (\ref dnn1 "nnet1") supports training on a single GPU card, which allows
    the implementation to be simpler and relatively easy to modify.
  - Dan's setup (\ref dnn2 "nnet2") is more flexible in how
    you can train: it supports using multiple GPUs, or multiple CPU's each with 
    multiple threads. Multiple GPU's is the recommended setup.  
    They don't have to all be on the same machine. Both setups give commensurate results.
  - The ``new'' setup, nnet3 (\ref dnn3 "nnet3") is intended (at least, by Dan)
    to be the recommended path going forward; how functional it is may depend on when
    you read this.

  Between the setups there are many differences in the recipes. For example, Karel's setup uses pre-training 
  but Dan's setup does not; Karel's setup uses early stopping using a validation set but
  Dan's setup uses a fixed number of epochs and averages the parameters over the last
  few epochs of training.  Most other details of the training (nonlinearity types, learning
  rate schedules, network topology, input features etc.) also differ.

  The best published descriptions of the DNN setups are:
  - Karel's setup : <a href="http://www.fit.vutbr.cz/research/groups/speech/publi/2013/vesely_interspeech2013_IS131333.pdf">Sequence-discriminative training of deep neural networks</a>
  - Dan's setup : <a href="http://arxiv-web3.library.cornell.edu/pdf/1410.7455v4">Parallel training of DNNs with natural gradient and parameter averaging</a>

  The setups use incompatible DNN formats, while there is a converter of Karel's network into Dan's format \ref dnn1_conversion_to_dnn2.

  - Documentation for Karel's version is available at \subpage dnn1 
  - Documentation for Dan's old version is available at \subpage dnn2.
  - Documentation for the nnet3 setup is available at \subpage dnn3.  
  - Documentation for the 'nnet3+chain' setup is available at \subpage chain.

*/


}