Blame view
src/doc/dnn3_code_optimization.dox
14 KB
8dcb6dfcb first commit |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 |
// doc/dnn3_code_optimization.dox // Copyright 2015 Johns Hopkins University (author: Daniel Povey) // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. namespace kaldi { namespace nnet3 { /** \page dnn3_code_optimization Optimization in the "nnet3" setup \section dnn3_code_optimization_intro Introduction This page covers the code-optimization process in the "nnet3" setup, in which we modify the sequence of commands stored in the NnetComputation object in order to make the execution more efficient. - Previous: \ref dnn3_code_compilation - Up: \ref dnn3 \section dnn3_optimize_overview Overview of optimization The optimization process is something that happens after compilation. It consists of modifying the NnetComputation to make it more efficient. From the point of view of the user it is just a single function call: \verbatim void Optimize(const NnetOptimizeConfig &config, const Nnet &nnet, const ComputationRequest &request, NnetComputation *computation); \endverbatim Internally, this performs various different types of optimizations, which we will go through below. We also discuss code analysis, which is used in the optimization code to help figure out which changes to the code are permissible; and code checking. This page is organized as: - \ref dnn3_optimize_analysis - \ref dnn3_optimize_checking - \ref dnn3_optimize_optimization \section dnn3_optimize_analysis Code analysis As mentioned, we have in nnet-analyze.h various utilities for analyzing code. We defined for a convenience a struct Analyzer which runs all this analysis on some compiled code: \verbatim struct Analyzer { ComputationVariables variables; std::vector<CommandAttributes> command_attributes; std::vector<std::vector<Access> > variable_accesses; std::vector<MatrixAccesses> matrix_accesses; void Init(const Nnet &nnet, const NnetComputation &computation); }; \endverbatim The Init function sets up its members: \verbatim void Analyzer::Init(const Nnet &nnet, const NnetComputation &computation) { variables.Init(computation); ComputeCommandAttributes(nnet, computation, variables, &command_attributes); ComputeVariableAccesses(variables, command_attributes, &variable_accesses); ComputeMatrixAccesses(nnet, computation, variables, command_attributes, &matrix_accesses); } \endverbatim There are really four function calls here, because the constructor of member "variables" sets up the member of type ComputationVariables. \subsection dnn3_optimize_analysis_variables Computation variables In order to analyze the computation, it is helpful to break it down into actions on individual variables. If efficiency were not an issue, we could do the analysis at the level of individual elements of a matrix, declaring these to be the variables. However, for now we do the analysis at a slightly coarser-grained level, consisting of column-ranges of matrices. We choose the coarsest set of row ranges such that the column-ranges of all the submatrices can be expressed exactly as a union of these column ranges. Class ComputationVariables is responsible for identifying these column ranges and for getting the set of variables associated with a given matrix or sub-matrix. Note that it is possible that in the future we may decide to do the analysis at a finer level (e.g. individual row of the current variables) which would enable more complete optimization in certain rather specialized circumstances. This would not involve very extensive changes to the code outside of class ComputationVariables. A "variable" is a zero-based index that corresponds to one of the variables identified by class ComputationVariables. The public interface of this class is below: \verbatim class ComputationVariables { public: void Init(const NnetComputation &computation); // This function updates the CommandAttributes object to record an access of // type read, write or read-write on the variables that this sub-matrix // corresponds to, and also updates the matrices_accessed variable by adding // the number of the underlying matrix. void RecordAccessForSubmatrix( int32 submatrix_index, AccessType access_type, CommandAttributes *ca) const; // Appends to variables_indexes the list of variables corresponding to a // matrix index. void AppendVariablesForMatrix( int32 matrix_index, std::vector<int32> *variable_indexes) const; int32 NumVariables() const { return num_variables_; } int32 GetMatrixForVariable(int32 variable) const; private: ... }; \endverbatim The \ref ComputationVariables::RecordAccessForSubmatrix() "RecordAccessForSubmatrix()" function won't be very self-explanatory because we haven't yet introduced struct CommandAttributes. We'll say more about it below. \subsection dnn3_optimize_analysis_attributes Command attributes Struct CommandAttributes records which variables are read and which variables written, and also which matrices are read and written. \verbatim struct CommandAttributes { // variables read std::vector<int32> variables_read; // variables written std::vector<int32> variables_written; // matrices read std::vector<int32> matrices_read; // matrices written std::vector<int32> matrices_written; // true if this command has side effects e.g. on the model (such as // Backprop on an updatable component, or StoreStats). bool has_side_effects; CommandAttributes(): has_side_effects(false) { } }; \endverbatim Some operations must be considered read/write instead of just read or write. For instance, adding something to a matrix is a read/write operation because the final result depends on what was there previously. In these cases we add a variable (or matrix) to both read and written lists. In addition, a pure-write operation that accesses only <em>some parts</em> of a variable or matrix must be considered a read/write operation on that variable or matrix, because the final value still depends on the contents at the start. The function ComputationVariables::RecordAccessForSubmatrix() is responsible for updating the CommandAttributes variable for commands; it is declared as follows. \verbatim void RecordAccessForSubmatrix( int32 submatrix_index, AccessType access_type, CommandAttributes *ca) const; \endverbatim where <code>AccessType</code> is an enum that can take values <code>kReadAccess</code>, <code>kWriteAccess</code> and <code>kReadWriteAccess</code>. \subsection dnn3_optimize_analysis_attributes_computing Computing the command attributes After initializing the ComputationVariables object, the next stage in analysis of a computation is to obtain a vector of CommandAttributes, one for each command in the computation. The function \ref ComputeCommandAttributes() is responsible for this. This function is mostly a big switch statement, and we show the first part of it in order to give the reader some idea what is going on: \verbatim void ComputeCommandAttributes( const Nnet &nnet, const NnetComputation &computation, const ComputationVariables &vars, std::vector<CommandAttributes> *attributes) { int32 num_commands = computation.commands.size(); attributes->clear(); attributes->resize(num_commands); for (int32 command_index = 0; command_index < num_commands; command_index++) { const NnetComputation::Command &c = computation.commands[command_index]; CommandAttributes &attr = (*attributes)[command_index]; switch (c.command_type) { case NnetComputation::kAllocMatrixZeroed: vars.AppendVariablesForMatrix(c.arg1, &attr.variables_written); break; case NnetComputation::kAllocMatrixUndefined: // nothing is written here. break; case NnetComputation::kDeallocMatrix: // ditto. break; case NnetComputation::kPropagate: vars.RecordAccessForSubmatrix(c.arg3, kReadAccess, &attr); if (nnet.GetComponent(c.arg1)->Properties() & kPropagateAdds) vars.RecordAccessForSubmatrix(c.arg4, kReadWriteAccess, &attr); else vars.RecordAccessForSubmatrix(c.arg4, kWriteAccess, &attr); break; ... \endverbatim \subsection dnn3_optimize_analysis_variable_accesses Computing the variable accesses The next stage in analysis is to compute the variable accesses. This takes the information we stored in the CommandAttributes, and list it per variable We define a struct Access as: \verbatim struct Access { int32 command_index; AccessType access_type; }; \endverbatim where AccessType is an enumeration value mentioned above. The accesses to any variable will be stored as a <code>std::vector<Access></code>, and function that computes the accesses for all variables is declared as follows: \verbatim void ComputeVariableAccesses( const ComputationVariables &variables, const std::vector<CommandAttributes> &command_attributes, std::vector<std::vector<Access> > *variable_accesses); \endverbatim The output <code>variable_accesses</code> is a vector of length (the number of variables), and then a list sorted by command index. There will be only one access per command, as we consolidate them- for example, a combination of a read and write access would be consolidated into a single <code>kReadWrite</code> access. \subsection dnn3_optimize_analysis_variable_matrix Computing the matrix accesses Struct MatrixAccesses stores all the information we record for a single matrix, relating to how it is allocated and accessed: \verbatim struct MatrixAccesses { // Index of the command that allocates the matrix, or -1 if the command // doesn't exist (e.g. it is an input). int32 allocate_command; // Index of the command that deallocates the matrix, or -1 if never gets // deallocated (e.g. it is an output). int32 deallocate_command; // Records the indexes of commands that access the matrix, and the type // (read, read/write, write). It will be sorted on command index with only // one record per command. Note: a write to only a part of the matrix // (i.e. a submatrix that isn't the whole thing) will be recorded as an // access of type read/write. std::vector<Access> accesses; // true if this matrix is an input to the computation. bool is_input; // true if this matrix is an output of the computation. bool is_output; MatrixAccesses(): allocate_command(-1), deallocate_command(-1), is_input(false), is_output(false) { } }; \endverbatim You can see that we store more information than we do for the variables (i.e. more than just the <code>std::vector<Access></code>). This is so that we can check whether the matrix is being allocated and deallocated appropriately. The matrix accesses are computed by the function ComputeMatrixAccesses(). \section dnn3_optimize_checking Checking the computation After performing the analysis as described in the previous section, we can check the computation using class ComputationChecker. We list some of its code below; a glance at some of the names of private function members will indicate the kind of checks it is performing. \verbatim class ComputationChecker { public: ComputationChecker(const CheckComputationConfig &config, const Nnet &nnet, const ComputationRequest &request, const NnetComputation &computation); void Check(); private: // various dimension consistency checks and checks on properties. void CheckComputationIndexes() const; // make sure Propagate comes before kNoOpMarker and Backprop comes after it, // and that the value of forward_computation_end matches the position of // kNoOpMarker. void CheckComputationOrder() const; // checks for a situation where an undefined variable is read. void CheckComputationUndefined() const; // checks that all writes are done before reads. details with implementation. void CheckComputationRewrite() const; // check matrix accesses make sense. void CheckComputationMatrixAccesses() const; ... \endverbatim This checking code exists mainly to detect bugs in the compilation and optimization code. \section dnn3_optimize_optimization Optimization The optimization code has an options class that enables the user to turn off various of the specific optimizations it does. This is intended to help in debugging. The options class has a variable for each individual optimization: \verbatim struct NnetOptimizeConfig { bool optimize; // setting this false disallow all optimization. bool propagate_in_place; bool backprop_in_place; bool remove_assignments; bool initialize_undefined; bool move_sizing_commands; ... }; \endverbatim The top-level call to the optimization code is just a function call. We show some partial code for this function below: \verbatim void Optimize(const NnetOptimizeConfig &config, const Nnet &nnet, const ComputationRequest &request, NnetComputation *computation) { if (!config.optimize) return; bool changed = true; while (changed) { changed = false; VariableMergingOptimizer opt(config, nnet, request, computation); if (opt.MergeVariables()) changed = true; } if (config.initialize_undefined) RemoveUnnecessaryZeroing(nnet, computation); if (config.move_sizing_commands) MoveSizingCommands(nnet, computation); } \endverbatim The VariableMergingOptimizer is a class that is responsible for merging variables together; it detects situations where there are two separate matrices that can be replaced with a single matrix. - Up: \ref dnn3 - Previous: \ref dnn3_code_compilation */ } } |