// doc/cudamatrix.dox // Copyright 2012 Karel Vesely // 2015 Johns Hopkins University (author: Daniel Povey) // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. namespace kaldi { /** \page cudamatrix The CUDA Matrix library The CUDA matrix library provides access to GPU-based matrix operations with an interface similar to \ref matrix. The general principle is that if you want to be able to run a particular part of the computation the GPU, you would declare the relevant quantities as type CuMatrix or CuVector instead of Matrix or Vector. Then, if you have configured Kaldi to use the GPU and if the Kaldi program you are running has initialized access to the GPU, those operations will run on the GPU. Otherwise, they will run on the CPU. CuMatrix and CuVector quantities store their contents in GPU memory space, if you have configured for GPU and your program has initialized the GPU device. You can't mix CuMatrix and CuVector with Matrix and Vector in matrix operations, because they live in different memory spaces, but you can copy from one to the other. Kaldi does not try to automatically decide which operations are best done on GPU: it is all under the control of the programmer. \subsection cudamatrix_configuration Configuring the CUDA matrix library If the configure script sees that the NVidia compilation tool nvcc is on the path when it is run, it assumes you want to compile for GPU, and will define HAVE_CUDA=1 and set other Makefile variables to enable GPU compilation. You can disable this if you don't want it by calling configure with --use-cuda=no. If the script doesn't find the location where you installed the CUDA toolkit but you want to use it, you can use an option like --cudatk-dir=/opt/cuda-4.2. If you want to tell whether Kaldi has been configured to use CUDA, you can grep for CUDATKDIR in kaldi.mk; if the string appears, then it has been configured to use CUDA. In scripts, you can check the return status of the program cuda-compiled: it returns success (0) if you compiled for CUDA. You can also tell from the logs whether a program is using the GPU. If it is using the GPU, you'll see lines like this near the top of the program's output: \verbatim LOG (nnet-train-simple:IsComputeExclusive():cu-device.cc:229) CUDA setup operating under Compute Exclusive Mode. LOG (nnet-train-simple:FinalizeActiveGpu():cu-device.cc:194) The active GPU is [1]: Tesla K10.G2.8GB \ free:3519M, used:64M, total:3583M, free/total:0.982121 version 3.0 \endverbatim In addition to configuring at the Makefile to use CUDA, if any individual program wants to use GPU operations it needs to have code like the following: \verbatim #if HAVE_CUDA==1 CuDevice::Instantiate().SelectGpuId(use_gpu); #endif \endverbatim where use_gpu is a string, typically a command-line option, that can take the following values: - "yes": use the GPU (or crash if one is not available). - "no" don't use the GPU. - "optional" use the GPU if the machine it's running on has GPUs attached. - "wait": like "yes" but if the GPUs are running other processes, the program will wait indefinitely until one becomes free. If a program doesn't take the --use-gpu command line option, that generally means that it hasn't been programmed to support the use of GPU operations, even if the code it runs contains the CuVector and CuMatrix types. Usually we only run specific tasks on the GPU- mainly neural net training. \subsection cudamatrix_modes GPU compute modes NVidia GPUs (which is the only kind Kaldi supports) have various "compute modes": "default", "process exclusive", "thread exclusive". This controls whether or not the GPU is configured to run multiple processes at the same time. Kaldi is intended to be run in "exclusive mode"; whether it's process exclusive or thread exclusive doesn't matter. You can find out what mode your GPU is running in as follows: \verbatim # nvidia-smi --query | grep 'Compute Mode' Compute Mode : Exclusive_Process \endverbatim You can set the correct mode by typing nvidia-smi -c 3. You might want to do this in a startup script so it happens each time you reboot. Rather than calling the malloc and free functions that NVidia provides, Kaldi does caching of previously released memory so that we don't have to incur the overhead of NVidia's malloc. This was done because at one point we were running in Amazon's cloud and found that NVidia's malloc was very slow. This was probably caused by the virtualization, and we're not sure whether that problem still exists. Anyway, the memory caching can cause a problem if for some reason you run using the default (non-exclusive) compute mode, because it can cause allocation failures. You can disable it at the code level by calling CuDevice::Instantiate().DisableCaching(), if needed. */ }