// doc/dnn3_code_optimization.dox


// Copyright 2015   Johns Hopkins University (author: Daniel Povey)

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

namespace kaldi {
namespace nnet3 {

/**
\page dnn3_code_optimization Optimization in the "nnet3" setup

\section dnn3_code_optimization_intro Introduction

  This page covers the code-optimization process in the "nnet3" setup, in which
  we modify the sequence of commands stored in the NnetComputation object in order
  to make the execution more efficient.

  - Previous: \ref dnn3_code_compilation
  - Up: \ref dnn3


\section dnn3_optimize_overview Overview of optimization

The optimization process is something that happens after compilation.  It consists of
modifying the NnetComputation to make it more efficient.  From the point of view
of the user it is just a single function call:
\verbatim
void Optimize(const NnetOptimizeConfig &config,
              const Nnet &nnet,
              const ComputationRequest &request,
              NnetComputation *computation);
\endverbatim
Internally, this performs various different types of optimizations, which we will
go through below.
We also discuss code analysis, which is used in the optimization code to help figure out which changes to the code
are permissible; and code checking.

This page is organized as:
  - \ref dnn3_optimize_analysis
  - \ref dnn3_optimize_checking
  - \ref dnn3_optimize_optimization

\section dnn3_optimize_analysis Code analysis

As mentioned, we have in nnet-analyze.h various utilities for analyzing code.
We defined for a convenience a struct Analyzer which runs all this analysis
on some compiled code:
\verbatim
struct Analyzer {
  ComputationVariables variables;
  std::vector<CommandAttributes> command_attributes;
  std::vector<std::vector<Access> > variable_accesses;
  std::vector<MatrixAccesses> matrix_accesses;
  void Init(const Nnet &nnet, const NnetComputation &computation);
};
\endverbatim
The Init function sets up its members:
\verbatim
void Analyzer::Init(const Nnet &nnet, const NnetComputation &computation) {
  variables.Init(computation);
  ComputeCommandAttributes(nnet, computation, variables, &command_attributes);
  ComputeVariableAccesses(variables, command_attributes, &variable_accesses);
  ComputeMatrixAccesses(nnet, computation, variables, command_attributes,
                        &matrix_accesses);
}
\endverbatim
There are really four function calls here, because the constructor
of member "variables" sets up the member of type ComputationVariables.

\subsection dnn3_optimize_analysis_variables Computation variables

In order to analyze the computation, it is helpful to break it down into actions
on individual variables.  If efficiency were not an issue, we could do the
analysis at the level of individual elements of a matrix, declaring these
to be the variables.  However, for now we do the
analysis at a slightly coarser-grained level, consisting of column-ranges of
matrices.  We choose the coarsest set of row ranges such that the column-ranges of
all the submatrices can be expressed exactly as a union of these column ranges.
Class ComputationVariables is responsible for identifying these column ranges
and for getting the set of variables associated with a given matrix or sub-matrix.

Note that it is possible that in the future we may decide to do the analysis at
a finer level (e.g. individual row of the current variables) which would enable
more complete optimization in certain rather specialized circumstances.  This
would not involve very extensive changes to the code outside of class
ComputationVariables.

A "variable" is a zero-based index that corresponds to one of the variables
identified by class ComputationVariables.  The public interface of this class
is below:
\verbatim
class ComputationVariables {
 public:
  void Init(const NnetComputation &computation);

  // This function updates the CommandAttributes object to record an access of
  // type read, write or read-write on the variables that this sub-matrix
  // corresponds to, and also updates the matrices_accessed variable by adding
  // the number of the underlying matrix.
  void RecordAccessForSubmatrix(
      int32 submatrix_index,
      AccessType access_type,
      CommandAttributes *ca) const;
  // Appends to variables_indexes the list of variables corresponding to a
  // matrix index.
  void AppendVariablesForMatrix(
      int32 matrix_index,
      std::vector<int32> *variable_indexes) const;
  int32 NumVariables() const { return num_variables_; }
  int32 GetMatrixForVariable(int32 variable) const;

 private:
   ...
};
\endverbatim
The \ref ComputationVariables::RecordAccessForSubmatrix() "RecordAccessForSubmatrix()" function
won't be very self-explanatory because we haven't yet introduced struct CommandAttributes.
We'll say more about it below.

\subsection dnn3_optimize_analysis_attributes  Command attributes

Struct CommandAttributes records which variables are read and which variables
written, and also which matrices are read and written.
\verbatim
struct CommandAttributes {
  // variables read
  std::vector<int32> variables_read;
  // variables written
  std::vector<int32> variables_written;

  // matrices read
  std::vector<int32> matrices_read;
  // matrices written
  std::vector<int32> matrices_written;

  // true if this command has side effects e.g. on the model (such as
  // Backprop on an updatable component, or StoreStats).
  bool has_side_effects;
  CommandAttributes(): has_side_effects(false) { }
};
\endverbatim
Some operations must be considered read/write instead of just read or write.
For instance, adding something to a matrix is a read/write operation because the
final result depends on what was there previously.  In these cases we
add a variable (or matrix) to both read and written lists.  In addition,
a pure-write operation that accesses only <em>some parts</em> of a variable
or matrix must be considered a read/write operation on that variable
or matrix, because the final value still depends on the contents at the start.

The function ComputationVariables::RecordAccessForSubmatrix() is responsible
for updating the CommandAttributes variable for commands; it is declared as follows.
\verbatim
  void RecordAccessForSubmatrix(
      int32 submatrix_index,
      AccessType access_type,
      CommandAttributes *ca) const;
\endverbatim
where <code>AccessType</code> is an enum that can take values <code>kReadAccess</code>,
<code>kWriteAccess</code> and <code>kReadWriteAccess</code>.


\subsection dnn3_optimize_analysis_attributes_computing Computing the command attributes

After initializing the ComputationVariables object, the
next stage in analysis of a computation is to obtain a vector of CommandAttributes,
one for each command in the computation.  The function \ref ComputeCommandAttributes()
is responsible for this.  This function is mostly a big switch statement, and we
show the first part of it in order to give the reader some idea what is
going on:
\verbatim
void ComputeCommandAttributes(
    const Nnet &nnet,
    const NnetComputation &computation,
    const ComputationVariables &vars,
    std::vector<CommandAttributes> *attributes) {
  int32 num_commands = computation.commands.size();
  attributes->clear();
  attributes->resize(num_commands);
  for (int32 command_index = 0; command_index < num_commands; command_index++) {
    const NnetComputation::Command &c = computation.commands[command_index];
    CommandAttributes &attr = (*attributes)[command_index];
    switch (c.command_type) {
      case NnetComputation::kAllocMatrixZeroed:
        vars.AppendVariablesForMatrix(c.arg1, &attr.variables_written);
        break;
      case NnetComputation::kAllocMatrixUndefined: // nothing is written here.
        break;
      case NnetComputation::kDeallocMatrix: // ditto.
        break;
      case NnetComputation::kPropagate:
        vars.RecordAccessForSubmatrix(c.arg3, kReadAccess, &attr);
        if (nnet.GetComponent(c.arg1)->Properties() & kPropagateAdds)
          vars.RecordAccessForSubmatrix(c.arg4, kReadWriteAccess, &attr);
        else
          vars.RecordAccessForSubmatrix(c.arg4, kWriteAccess, &attr);
        break;
        ...
\endverbatim

\subsection dnn3_optimize_analysis_variable_accesses Computing the variable accesses

The next stage in analysis is to compute the variable accesses.  This
takes the information we stored in the CommandAttributes, and list it
per variable  We define a struct Access as:
\verbatim
struct Access {
  int32 command_index;
  AccessType access_type;
};
\endverbatim
where AccessType is an enumeration value mentioned above.  The accesses
to any variable will be stored as a <code>std::vector<Access></code>, and
function that computes the accesses for all variables is declared as follows:
\verbatim
void ComputeVariableAccesses(
    const ComputationVariables &variables,
    const std::vector<CommandAttributes> &command_attributes,
    std::vector<std::vector<Access> > *variable_accesses);
\endverbatim
The output <code>variable_accesses</code> is a vector of length (the number of variables),
and then a list sorted by command index.  There will be only one access per command,
as we consolidate them- for example, a combination of a read and write access
would be consolidated into a single <code>kReadWrite</code> access.


\subsection dnn3_optimize_analysis_variable_matrix Computing the matrix accesses

Struct MatrixAccesses stores all the information we record for a single
matrix, relating to how it is allocated and accessed:
\verbatim
struct MatrixAccesses {
  // Index of the command that allocates the matrix, or -1 if the command
  // doesn't exist (e.g. it is an input).
  int32 allocate_command;
  // Index of the command that deallocates the matrix, or -1 if never gets
  // deallocated (e.g. it is an output).
  int32 deallocate_command;
  // Records the indexes of commands that access the matrix, and the type
  // (read, read/write, write).  It will be sorted on command index with only
  // one record per command.  Note: a write to only a part of the matrix
  // (i.e. a submatrix that isn't the whole thing) will be recorded as an
  // access of type read/write.
  std::vector<Access> accesses;
  // true if this matrix is an input to the computation.
  bool is_input;
  // true if this matrix is an output of the computation.
  bool is_output;
  MatrixAccesses(): allocate_command(-1), deallocate_command(-1),
                    is_input(false), is_output(false) { }
};
\endverbatim
You can see that we store more information than we do for the variables (i.e. more than just
the <code>std::vector<Access></code>).  This is so that we can check whether
the matrix is being allocated and deallocated appropriately.

The matrix accesses are computed by the function ComputeMatrixAccesses().

\section dnn3_optimize_checking Checking the computation

After performing the analysis as described in the previous section, we can
check the computation using class ComputationChecker.   We list some of
its code below; a glance at some of the names of private function members
will indicate the kind of checks it is performing.
\verbatim
class ComputationChecker {
 public:
  ComputationChecker(const CheckComputationConfig &config,
                     const Nnet &nnet,
                     const ComputationRequest &request,
                     const NnetComputation &computation);
  void Check();
 private:
  // various dimension consistency checks and checks on properties.
  void CheckComputationIndexes() const;
  // make sure Propagate comes before kNoOpMarker and Backprop comes after it,
  // and that the value of forward_computation_end matches the position of
  // kNoOpMarker.
  void CheckComputationOrder() const;
  // checks for a situation where an undefined variable is read.
  void CheckComputationUndefined() const;
  // checks that all writes are done before reads.  details with implementation.
  void CheckComputationRewrite() const;
  // check matrix accesses make sense.
  void CheckComputationMatrixAccesses() const;
  ...
\endverbatim
This checking code exists mainly to detect bugs in the compilation and optimization code.


\section dnn3_optimize_optimization Optimization

The optimization code has an options class that enables the user to turn off various
of the specific optimizations it does.  This is intended to help in debugging.
The options class has a variable for each individual optimization:
\verbatim
struct NnetOptimizeConfig {
  bool optimize;  // setting this false disallow all optimization.
  bool propagate_in_place;
  bool backprop_in_place;
  bool remove_assignments;
  bool initialize_undefined;
  bool move_sizing_commands;
  ...
};
\endverbatim
The top-level call to the optimization code is just a function call.
We show some partial code for this function below:
\verbatim
void Optimize(const NnetOptimizeConfig &config,
              const Nnet &nnet,
              const ComputationRequest &request,
              NnetComputation *computation) {
  if (!config.optimize)
    return;
  bool changed = true;
  while (changed) {
    changed = false;
    VariableMergingOptimizer opt(config, nnet, request, computation);
    if (opt.MergeVariables())
      changed = true;
  }
  if (config.initialize_undefined)
    RemoveUnnecessaryZeroing(nnet, computation);

  if (config.move_sizing_commands)
    MoveSizingCommands(nnet, computation);
}
\endverbatim
The VariableMergingOptimizer is a class that is responsible for merging
variables together; it detects situations where there are two separate
matrices that can be replaced with a single matrix.

 - Up: \ref dnn3
 - Previous: \ref dnn3_code_compilation

*/

}
}