Blame view

src/doc/cudamatrix.dox 5.57 KB
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
  // doc/cudamatrix.dox
  
  
  // Copyright 2012  Karel Vesely
  //           2015  Johns Hopkins University (author: Daniel Povey)
  
  // See ../../COPYING for clarification regarding multiple authors
  //
  // Licensed under the Apache License, Version 2.0 (the "License");
  // you may not use this file except in compliance with the License.
  // You may obtain a copy of the License at
  
  //  http://www.apache.org/licenses/LICENSE-2.0
  
  // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
  // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
  // MERCHANTABLITY OR NON-INFRINGEMENT.
  // See the Apache 2 License for the specific language governing permissions and
  // limitations under the License.
  
  namespace kaldi {
  /**
    \page cudamatrix The CUDA Matrix library
  
   The CUDA matrix library provides access to GPU-based matrix operations with
   an interface similar to \ref matrix.
   The general principle is that if you want to be able to run a particular part
   of the computation the GPU, you would declare the relevant quantities as
   type CuMatrix or CuVector instead of Matrix or Vector.  Then, if you have
   configured Kaldi to use the GPU and if the Kaldi program you are running
   has initialized access to the GPU, those operations will run on the GPU.
   Otherwise, they will run on the CPU.  CuMatrix and CuVector quantities
   store their contents in GPU memory space, if you have configured for GPU
   and your program has initialized the GPU device.
  
   You can't mix CuMatrix and CuVector with Matrix and Vector in matrix operations,
   because they live in different memory spaces, but you can copy from one to
   the other.  Kaldi does not try to automatically decide which operations are
   best done on GPU: it is all under the control of the programmer.
  
  
    \subsection cudamatrix_configuration Configuring the CUDA matrix library
  
   If the <tt>configure</tt> script sees that the NVidia compilation tool <tt>nvcc</tt> is on
   the path when it is run, it assumes you want to compile for GPU, and will define
   <tt>HAVE_CUDA=1</tt> and set other Makefile variables to enable GPU compilation.
   You can disable this if you don't want it by calling <tt>configure</tt> with
   <tt>--use-cuda=no</tt>.  If the script doesn't find the location where you installed
   the CUDA toolkit but you want to use it, you can use an option like
   <tt>--cudatk-dir=/opt/cuda-4.2</tt>.
   If you want to tell whether Kaldi has been configured to use CUDA, you can
   grep for <tt>CUDATKDIR</tt> in <tt>kaldi.mk</tt>; if the string appears, then it has
   been configured to use CUDA.  In scripts, you can check the return status of
   the program <tt>cuda-compiled</tt>: it returns success (0) if you compiled for CUDA.
  
   You can also tell from the logs whether a program is using the GPU.  If it is using
   the GPU, you'll see lines like this near the top of the program's output:
  \verbatim
  LOG (nnet-train-simple:IsComputeExclusive():cu-device.cc:229) CUDA setup operating under Compute Exclusive Mode.
  LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [1]: Tesla K10.G2.8GB  \
      free:3519M, used:64M, total:3583M, free/total:0.982121 version 3.0
  \endverbatim
   In addition to configuring at the Makefile to use CUDA, if any individual
   program wants to use GPU operations it needs to have code like the following:
  \verbatim
  #if HAVE_CUDA==1
      CuDevice::Instantiate().SelectGpuId(use_gpu);
  #endif
  \endverbatim
  where <tt>use_gpu</tt> is a string, typically a command-line option, that can take
  the following values:
  
    - <tt>"yes"</tt>: use the GPU (or crash if one is not available).
    - <tt>"no"</tt> don't use the GPU.
    - <tt>"optional"</tt> use the GPU if the machine it's running on has GPUs attached.
    - <tt>"wait"</tt>: like <tt>"yes"</tt> but if the GPUs are running other processes,
         the program will wait indefinitely until one becomes free.
  
  If a program doesn't take the <tt>--use-gpu</tt> command line option, that generally
  means that it hasn't been programmed to support the use of GPU operations, even if the code
  it runs contains the CuVector and CuMatrix types.  Usually we only run specific tasks
  on the GPU- mainly neural net training.
  
  \subsection cudamatrix_modes GPU compute modes
  
   NVidia GPUs (which is the only kind Kaldi supports) have various "compute modes":
   "default", "process exclusive", "thread exclusive".  This controls whether or not the
   GPU is configured to run multiple processes at the same time.  Kaldi is intended
   to be run in "exclusive mode"; whether it's process exclusive or thread exclusive doesn't
   matter.  You can find out what mode your GPU is running in as follows:
  \verbatim
  # nvidia-smi  --query | grep 'Compute Mode'
      Compute Mode                    : Exclusive_Process
  \endverbatim
   You can set the correct mode by typing <tt>nvidia-smi -c 3</tt>.  You might want to
   do this in a startup script so it happens each time you reboot.
  
   Rather than calling the malloc and free functions that NVidia provides, Kaldi
   does caching of previously released memory so that we don't have to incur the
   overhead of NVidia's malloc.  This was done because at one point we were running
   in Amazon's cloud and found that NVidia's malloc was very slow.  This was probably
   caused by the virtualization, and we're not sure whether that problem still exists.
   Anyway, the memory caching can cause a problem if for some reason you run using
   the default (non-exclusive) compute mode, because it can cause allocation
   failures.  You can disable it at the code level by calling
   <tt>CuDevice::Instantiate().DisableCaching()</tt>, if needed.
  
  
  
  */
  
  }