// doc/dnn3_code_optimization.dox // Copyright 2015 Johns Hopkins University (author: Daniel Povey) // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. namespace kaldi { namespace nnet3 { /** \page dnn3_code_optimization Optimization in the "nnet3" setup \section dnn3_code_optimization_intro Introduction This page covers the code-optimization process in the "nnet3" setup, in which we modify the sequence of commands stored in the NnetComputation object in order to make the execution more efficient. - Previous: \ref dnn3_code_compilation - Up: \ref dnn3 \section dnn3_optimize_overview Overview of optimization The optimization process is something that happens after compilation. It consists of modifying the NnetComputation to make it more efficient. From the point of view of the user it is just a single function call: \verbatim void Optimize(const NnetOptimizeConfig &config, const Nnet &nnet, const ComputationRequest &request, NnetComputation *computation); \endverbatim Internally, this performs various different types of optimizations, which we will go through below. We also discuss code analysis, which is used in the optimization code to help figure out which changes to the code are permissible; and code checking. This page is organized as: - \ref dnn3_optimize_analysis - \ref dnn3_optimize_checking - \ref dnn3_optimize_optimization \section dnn3_optimize_analysis Code analysis As mentioned, we have in nnet-analyze.h various utilities for analyzing code. We defined for a convenience a struct Analyzer which runs all this analysis on some compiled code: \verbatim struct Analyzer { ComputationVariables variables; std::vector command_attributes; std::vector > variable_accesses; std::vector matrix_accesses; void Init(const Nnet &nnet, const NnetComputation &computation); }; \endverbatim The Init function sets up its members: \verbatim void Analyzer::Init(const Nnet &nnet, const NnetComputation &computation) { variables.Init(computation); ComputeCommandAttributes(nnet, computation, variables, &command_attributes); ComputeVariableAccesses(variables, command_attributes, &variable_accesses); ComputeMatrixAccesses(nnet, computation, variables, command_attributes, &matrix_accesses); } \endverbatim There are really four function calls here, because the constructor of member "variables" sets up the member of type ComputationVariables. \subsection dnn3_optimize_analysis_variables Computation variables In order to analyze the computation, it is helpful to break it down into actions on individual variables. If efficiency were not an issue, we could do the analysis at the level of individual elements of a matrix, declaring these to be the variables. However, for now we do the analysis at a slightly coarser-grained level, consisting of column-ranges of matrices. We choose the coarsest set of row ranges such that the column-ranges of all the submatrices can be expressed exactly as a union of these column ranges. Class ComputationVariables is responsible for identifying these column ranges and for getting the set of variables associated with a given matrix or sub-matrix. Note that it is possible that in the future we may decide to do the analysis at a finer level (e.g. individual row of the current variables) which would enable more complete optimization in certain rather specialized circumstances. This would not involve very extensive changes to the code outside of class ComputationVariables. A "variable" is a zero-based index that corresponds to one of the variables identified by class ComputationVariables. The public interface of this class is below: \verbatim class ComputationVariables { public: void Init(const NnetComputation &computation); // This function updates the CommandAttributes object to record an access of // type read, write or read-write on the variables that this sub-matrix // corresponds to, and also updates the matrices_accessed variable by adding // the number of the underlying matrix. void RecordAccessForSubmatrix( int32 submatrix_index, AccessType access_type, CommandAttributes *ca) const; // Appends to variables_indexes the list of variables corresponding to a // matrix index. void AppendVariablesForMatrix( int32 matrix_index, std::vector *variable_indexes) const; int32 NumVariables() const { return num_variables_; } int32 GetMatrixForVariable(int32 variable) const; private: ... }; \endverbatim The \ref ComputationVariables::RecordAccessForSubmatrix() "RecordAccessForSubmatrix()" function won't be very self-explanatory because we haven't yet introduced struct CommandAttributes. We'll say more about it below. \subsection dnn3_optimize_analysis_attributes Command attributes Struct CommandAttributes records which variables are read and which variables written, and also which matrices are read and written. \verbatim struct CommandAttributes { // variables read std::vector variables_read; // variables written std::vector variables_written; // matrices read std::vector matrices_read; // matrices written std::vector matrices_written; // true if this command has side effects e.g. on the model (such as // Backprop on an updatable component, or StoreStats). bool has_side_effects; CommandAttributes(): has_side_effects(false) { } }; \endverbatim Some operations must be considered read/write instead of just read or write. For instance, adding something to a matrix is a read/write operation because the final result depends on what was there previously. In these cases we add a variable (or matrix) to both read and written lists. In addition, a pure-write operation that accesses only some parts of a variable or matrix must be considered a read/write operation on that variable or matrix, because the final value still depends on the contents at the start. The function ComputationVariables::RecordAccessForSubmatrix() is responsible for updating the CommandAttributes variable for commands; it is declared as follows. \verbatim void RecordAccessForSubmatrix( int32 submatrix_index, AccessType access_type, CommandAttributes *ca) const; \endverbatim where AccessType is an enum that can take values kReadAccess, kWriteAccess and kReadWriteAccess. \subsection dnn3_optimize_analysis_attributes_computing Computing the command attributes After initializing the ComputationVariables object, the next stage in analysis of a computation is to obtain a vector of CommandAttributes, one for each command in the computation. The function \ref ComputeCommandAttributes() is responsible for this. This function is mostly a big switch statement, and we show the first part of it in order to give the reader some idea what is going on: \verbatim void ComputeCommandAttributes( const Nnet &nnet, const NnetComputation &computation, const ComputationVariables &vars, std::vector *attributes) { int32 num_commands = computation.commands.size(); attributes->clear(); attributes->resize(num_commands); for (int32 command_index = 0; command_index < num_commands; command_index++) { const NnetComputation::Command &c = computation.commands[command_index]; CommandAttributes &attr = (*attributes)[command_index]; switch (c.command_type) { case NnetComputation::kAllocMatrixZeroed: vars.AppendVariablesForMatrix(c.arg1, &attr.variables_written); break; case NnetComputation::kAllocMatrixUndefined: // nothing is written here. break; case NnetComputation::kDeallocMatrix: // ditto. break; case NnetComputation::kPropagate: vars.RecordAccessForSubmatrix(c.arg3, kReadAccess, &attr); if (nnet.GetComponent(c.arg1)->Properties() & kPropagateAdds) vars.RecordAccessForSubmatrix(c.arg4, kReadWriteAccess, &attr); else vars.RecordAccessForSubmatrix(c.arg4, kWriteAccess, &attr); break; ... \endverbatim \subsection dnn3_optimize_analysis_variable_accesses Computing the variable accesses The next stage in analysis is to compute the variable accesses. This takes the information we stored in the CommandAttributes, and list it per variable We define a struct Access as: \verbatim struct Access { int32 command_index; AccessType access_type; }; \endverbatim where AccessType is an enumeration value mentioned above. The accesses to any variable will be stored as a std::vector, and function that computes the accesses for all variables is declared as follows: \verbatim void ComputeVariableAccesses( const ComputationVariables &variables, const std::vector &command_attributes, std::vector > *variable_accesses); \endverbatim The output variable_accesses is a vector of length (the number of variables), and then a list sorted by command index. There will be only one access per command, as we consolidate them- for example, a combination of a read and write access would be consolidated into a single kReadWrite access. \subsection dnn3_optimize_analysis_variable_matrix Computing the matrix accesses Struct MatrixAccesses stores all the information we record for a single matrix, relating to how it is allocated and accessed: \verbatim struct MatrixAccesses { // Index of the command that allocates the matrix, or -1 if the command // doesn't exist (e.g. it is an input). int32 allocate_command; // Index of the command that deallocates the matrix, or -1 if never gets // deallocated (e.g. it is an output). int32 deallocate_command; // Records the indexes of commands that access the matrix, and the type // (read, read/write, write). It will be sorted on command index with only // one record per command. Note: a write to only a part of the matrix // (i.e. a submatrix that isn't the whole thing) will be recorded as an // access of type read/write. std::vector accesses; // true if this matrix is an input to the computation. bool is_input; // true if this matrix is an output of the computation. bool is_output; MatrixAccesses(): allocate_command(-1), deallocate_command(-1), is_input(false), is_output(false) { } }; \endverbatim You can see that we store more information than we do for the variables (i.e. more than just the std::vector). This is so that we can check whether the matrix is being allocated and deallocated appropriately. The matrix accesses are computed by the function ComputeMatrixAccesses(). \section dnn3_optimize_checking Checking the computation After performing the analysis as described in the previous section, we can check the computation using class ComputationChecker. We list some of its code below; a glance at some of the names of private function members will indicate the kind of checks it is performing. \verbatim class ComputationChecker { public: ComputationChecker(const CheckComputationConfig &config, const Nnet &nnet, const ComputationRequest &request, const NnetComputation &computation); void Check(); private: // various dimension consistency checks and checks on properties. void CheckComputationIndexes() const; // make sure Propagate comes before kNoOpMarker and Backprop comes after it, // and that the value of forward_computation_end matches the position of // kNoOpMarker. void CheckComputationOrder() const; // checks for a situation where an undefined variable is read. void CheckComputationUndefined() const; // checks that all writes are done before reads. details with implementation. void CheckComputationRewrite() const; // check matrix accesses make sense. void CheckComputationMatrixAccesses() const; ... \endverbatim This checking code exists mainly to detect bugs in the compilation and optimization code. \section dnn3_optimize_optimization Optimization The optimization code has an options class that enables the user to turn off various of the specific optimizations it does. This is intended to help in debugging. The options class has a variable for each individual optimization: \verbatim struct NnetOptimizeConfig { bool optimize; // setting this false disallow all optimization. bool propagate_in_place; bool backprop_in_place; bool remove_assignments; bool initialize_undefined; bool move_sizing_commands; ... }; \endverbatim The top-level call to the optimization code is just a function call. We show some partial code for this function below: \verbatim void Optimize(const NnetOptimizeConfig &config, const Nnet &nnet, const ComputationRequest &request, NnetComputation *computation) { if (!config.optimize) return; bool changed = true; while (changed) { changed = false; VariableMergingOptimizer opt(config, nnet, request, computation); if (opt.MergeVariables()) changed = true; } if (config.initialize_undefined) RemoveUnnecessaryZeroing(nnet, computation); if (config.move_sizing_commands) MoveSizingCommands(nnet, computation); } \endverbatim The VariableMergingOptimizer is a class that is responsible for merging variables together; it detects situations where there are two separate matrices that can be replaced with a single matrix. - Up: \ref dnn3 - Previous: \ref dnn3_code_compilation */ } }