cudamatrix.dox 5.57 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113


// doc/cudamatrix.dox


// Copyright 2012  Karel Vesely
//           2015  Johns Hopkins University (author: Daniel Povey)

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

namespace kaldi {
/**
  \page cudamatrix The CUDA Matrix library

 The CUDA matrix library provides access to GPU-based matrix operations with
 an interface similar to \ref matrix.
 The general principle is that if you want to be able to run a particular part
 of the computation the GPU, you would declare the relevant quantities as
 type CuMatrix or CuVector instead of Matrix or Vector.  Then, if you have
 configured Kaldi to use the GPU and if the Kaldi program you are running
 has initialized access to the GPU, those operations will run on the GPU.
 Otherwise, they will run on the CPU.  CuMatrix and CuVector quantities
 store their contents in GPU memory space, if you have configured for GPU
 and your program has initialized the GPU device.

 You can't mix CuMatrix and CuVector with Matrix and Vector in matrix operations,
 because they live in different memory spaces, but you can copy from one to
 the other.  Kaldi does not try to automatically decide which operations are
 best done on GPU: it is all under the control of the programmer.


  \subsection cudamatrix_configuration Configuring the CUDA matrix library

 If the <tt>configure</tt> script sees that the NVidia compilation tool <tt>nvcc</tt> is on
 the path when it is run, it assumes you want to compile for GPU, and will define
 <tt>HAVE_CUDA=1</tt> and set other Makefile variables to enable GPU compilation.
 You can disable this if you don't want it by calling <tt>configure</tt> with
 <tt>--use-cuda=no</tt>.  If the script doesn't find the location where you installed
 the CUDA toolkit but you want to use it, you can use an option like
 <tt>--cudatk-dir=/opt/cuda-4.2</tt>.
 If you want to tell whether Kaldi has been configured to use CUDA, you can
 grep for <tt>CUDATKDIR</tt> in <tt>kaldi.mk</tt>; if the string appears, then it has
 been configured to use CUDA.  In scripts, you can check the return status of
 the program <tt>cuda-compiled</tt>: it returns success (0) if you compiled for CUDA.

 You can also tell from the logs whether a program is using the GPU.  If it is using
 the GPU, you'll see lines like this near the top of the program's output:
\verbatim
LOG (nnet-train-simple:IsComputeExclusive():cu-device.cc:229) CUDA setup operating under Compute Exclusive Mode.
LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [1]: Tesla K10.G2.8GB  \
    free:3519M, used:64M, total:3583M, free/total:0.982121 version 3.0
\endverbatim
 In addition to configuring at the Makefile to use CUDA, if any individual
 program wants to use GPU operations it needs to have code like the following:
\verbatim
#if HAVE_CUDA==1
    CuDevice::Instantiate().SelectGpuId(use_gpu);
#endif
\endverbatim
where <tt>use_gpu</tt> is a string, typically a command-line option, that can take
the following values:

  - <tt>"yes"</tt>: use the GPU (or crash if one is not available).
  - <tt>"no"</tt> don't use the GPU.
  - <tt>"optional"</tt> use the GPU if the machine it's running on has GPUs attached.
  - <tt>"wait"</tt>: like <tt>"yes"</tt> but if the GPUs are running other processes,
       the program will wait indefinitely until one becomes free.

If a program doesn't take the <tt>--use-gpu</tt> command line option, that generally
means that it hasn't been programmed to support the use of GPU operations, even if the code
it runs contains the CuVector and CuMatrix types.  Usually we only run specific tasks
on the GPU- mainly neural net training.

\subsection cudamatrix_modes GPU compute modes

 NVidia GPUs (which is the only kind Kaldi supports) have various "compute modes":
 "default", "process exclusive", "thread exclusive".  This controls whether or not the
 GPU is configured to run multiple processes at the same time.  Kaldi is intended
 to be run in "exclusive mode"; whether it's process exclusive or thread exclusive doesn't
 matter.  You can find out what mode your GPU is running in as follows:
\verbatim
# nvidia-smi  --query | grep 'Compute Mode'
    Compute Mode                    : Exclusive_Process
\endverbatim
 You can set the correct mode by typing <tt>nvidia-smi -c 3</tt>.  You might want to
 do this in a startup script so it happens each time you reboot.

 Rather than calling the malloc and free functions that NVidia provides, Kaldi
 does caching of previously released memory so that we don't have to incur the
 overhead of NVidia's malloc.  This was done because at one point we were running
 in Amazon's cloud and found that NVidia's malloc was very slow.  This was probably
 caused by the virtualization, and we're not sure whether that problem still exists.
 Anyway, the memory caching can cause a problem if for some reason you run using
 the default (non-exclusive) compute mode, because it can cause allocation
 failures.  You can disable it at the code level by calling
 <tt>CuDevice::Instantiate().DisableCaching()</tt>, if needed.


*/

}