Blame view
src/doc/tree_internals.dox
14.2 KB
8dcb6dfcb first commit |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
// doc/tree_internals.dox // Copyright 2009-2011 Microsoft Corporation // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. namespace kaldi { /** \page tree_internals Decision tree internals This page describes the internals of the phonetic decision-tree clustering code, which are implemented as quite generic structures and algorithms. For a higher-level view that explains the overall algorithm being implemented and how it is used by the toolkit, see \ref tree_externals. \section treei_event_map Event maps The central concept in the decision-tree building code is the event map, represented by type EventMap. Don't be misled by the term "event" to think that this is something happening at a particular time. An event is just a set of (key,value) pairs, with no key repeated. Conceptually, it could be represented by the type std::map<int32,int32>. In fact, for efficiency we represent it as a sorted vector of pairs, via the typedef: \code typedef std::vector<std::pair<EventKeyType,EventValueType> > EventType; \endcode Here, EventKeyType and EventValueType are just typedefs to int32, but the code is easier to understand if we use distinct names for them. Think of an Event (type EventType) as being like a collection of variables, except that the names and values of the variables are both integers. There is also the type \ref EventAnswerType, and this is also int32. It is the type returned by the EventMap; in practice it is a pdf identifier (acoustic-state index). Functions that require EventType expect you to have sorted it first (e.g. by calling std::sort(e.begin(),e.end()); ) The purpose of the EventMap class is best illustrated via an example. Suppose our phone-in-context is a/b/c; suppose we have the standard 3-state topology; and suppose we want to ask what is the index of the pdf for the central state of phone "b" in this context. The state in question would be state 1, since we use zero-based indexing. We are skimming over a few details here when we refer to "state"; see \ref pdf_class for more information. Suppose the integer indices of phones a, b and c are 10, 11 and 12 respectively. The "event" would correspond to the mapping: \verbatim (0 -> 10), (1 -> 11), (2 -> 12), (-1 -> 1) \endverbatim where 0, 1 and 2 are the positions in the 3-phone window "a/b/c", and -1 is a special index we use to encode the state-id (c.f. the constant \ref kPdfClass = -1). The way this would be represented as a sorted vector of pairs is: \code // caution: not valid C++. EventType e = { {-1,1}, {0,10}, {1,11}, {2,12} }; \endcode Suppose we that the acoustic-state index (pdf-id) corresponding to this acoustic state happens to be 1000. Then if we have an EventMap "emap" representing the tree, then we would expect the following assert not to fail: \code EventAnswerType ans; bool ret = emap.Map(e, &ans); // emap is of type EventMap; e is EventType KALDI_ASSERT(ret == true && ans == 1000); \endcode So when we state that an EventMap is a map from EventType to EventAnswerType, you can think of this roughly as a map from a phone-in-context to an integer index. The phone-in-context is represented as a set of (key-value) pairs, and in principle we could add new keys and put in more information. Notice that the EventMap::Map() function returns bool. This is because it's possible for certain events not to map to any answer at all (e.g. consider an invalid phone, or imagine what happens when the EventType doesn't contain all the keys that the EventMap is querying). The EventMap is a very passive object. It does not have any capability to learn decision-trees, it is just a means to store decision-trees. Think of it as a structure that represents a function from EventType to EventAnswerType. EventMap is a polymorphic, pure-virtual class (i.e. it cannot be instantiated because it has virtual functions that are not implemented). There are three concrete classes that implement the EventMap interface: - ConstantEventMap: Think of this as a decision-tree leaf node. This class stores an integer of type EventAnswerType and its Map function always returns that value. - SplitEventMap: Think of this as a decision-tree non-leaf node that queries a certain key and goes to the "yes" or "no" child node depending on the answer. Its Map function calls the Map function of the appropriate child node. It stores a set of integers of type kAnswerType that correspond to the "yes" child (everything else goes to "no"). - TableEventMap: This does a complete split on a particular key. A typical example is: you might first split completely on the central phone and then have a separate decision tree for each value of that phone. Internally it stores a vector of EventMap* pointers. It looks up the value corresponding to the key that it is splitting on, and calls the Map function of the EventMap at the corresponding position in the vector. The EventMap cannot really do very much apart from map EventType to EventAnswerType. Its interface does not provide functions that allow you to traverse the tree, either up or down. There is just one function that allows you to modify the EventMap, and this is the EventMap::Copy() function, declared like this (as a class-member): \code virtual EventMap *Copy(const std::vector<EventMap*> &new_leaves) const; \endcode This has an effect similar to function composition. If you call Copy() with an empty vector "new_leaves", it will just return a deep copy of the entire object, copying all the pointers as it goes down the tree. However, if new_leaves is nonempty, each time the Copy() function reaches a leaf, if that leaf \c l is within the range (0, new_leaves.size()-1) and \c new_leaves[\c l] is non-NULL, the Copy() function will return the result of calling \c new_leaves[\c l]->\c Copy(), rather than the \c Copy() function of the leaf itself (which would have returned a new ConstantEventMap). A typical example is where you decide to split a particular leaf, say leaf 852. You could create an object of type vector<EventMap*> whose only non-NULL member is as position 852. It would contain a pointer to an object whose real type is SplitEventMap, and the "yes" and "no" pointers would be to type ConstantEventMap with leaf values, say, 852 and 1234 (we reuse the old leaf-node id for the new leaf). The real tree-building code is not this inefficient. \section treei_stats Statistics for building the tree The statistics used to build the phonetic decision tree are of the following type: \code typedef std::vector<std::pair<EventType, Clusterable*> > BuildTreeStatsType; \endcode A reference to an object of this type is passed to all the decision-tree building routines. These statistics are expected to contain no duplicates with the same EventType member, i.e. they conceptually represent a map from \ref EventType to Clusterable. In our current code the Clusterable objects are really of type GaussClusterable, but the tree-building code doesn't know this. A program that accumulates these statistics is acc-tree-stats.cc. \section treei_func Classes and functions involved in tree-building \subsection treei_func_questions Questions (config class) The class Questions is a class that in its interaction with the tree-building code, behaves a little bit like a "configuration" class. It is really a map from the "key" value (of type \ref EventKeyType), to a configuration class of type QuestionsForKey. Class QuestionsForKey has a set of "questions", each of which is just a set of integers of type \ref EventValueType; these mostly correspond to sets of phones, or if the key is -1, they would in the typical case correspond to sets of HMM-state indices, i.e. subsets of {0,1,2}. QuestionsForKey also contains a configuration class of type RefineClustersOptions. This controls tree-building behaviour in that the tree-building code will attempt to iteratively move values (e.g. phones) between the two sides of the split in order to maximize likelihood (as in K-means with K=2). However, this can be turned off by setting the number of iterations of "refining clusters" to zero, which corresponds to just choosing from among a fixed list of questions. This seems to work a bit better. \subsection treei_func_low Lowest-level functions See \ref tree_group_lower for a complete list; we summarize some important ones here. These functions mostly involve manipulations of objects of type BuildTreeStatsType, which as we saw above is a vector of pairs of (EventType, Clusterable*). The simplest ones are DeleteBuildTreeStats(), WriteBuildTreeStats() and ReadBuildTreeStats(). The function PossibleValues() finds out which values a particular key takes within a collection of statistics (and informs the user whether or not the key was always defined); SplitStatsByKey() will split up an object of type BuildTreeStatsType into a vector<BuildTreeStatsType> according to the value taken by a particular key (e.g. splitting on the central phone); SplitStatsByMap() does the same but the index is not the value for that key, but the answer returned by an EventMap that is provided to the function. SumStats() sums up the statistics (i.e. the Clusterable objects) in an object of BuildTreeStatsType and returns corresponding Clusterable* object. SumStatsVec() takes an object of type vector<BuildTreeStatsType> and outputs something of type vector<Clusterable*>, i.e. it's like SumStats() but for a vector; it's useful in processing the output of SplitStatsByKey() and SplitStatsByMap(). ObjfGivenMap() evaluates the objective function given some statistics and an EventMap: it sums up all the statistics within each cluster (corresponding to each distinct answer of the EventMap::Map() function), adds up the objective functions across the clusters, and returns the total. FindAllKeys() will find all the keys that are defined in a collection of statistics, and depending on the arguments it can find all keys that are defined for all the events, or all the keys that are defined for any event (i.e. take the intersection or union of sets of defined keys). \subsection treei_func_intermediate Intermediate-level functions The next group of functions involved in tree building are listed \ref tree_group_intermediate "here" and correspond to various stages of tree building. We will now only mention some representative ones. Firstly, we point out that a lot of these functions have a parameter int32 *num_leaves. The integer which this points to acts as a counter to allocate new leaves. At the start of tree building the caller would set it to zero. Aach time a new leaf is required, the tree-building code takes the number it currently points to as the new leaf-id, and then increments it. One important function is GetStubMap(). This function returns a tree that has not yet been split, i.e. the pdfs do not depend on the context. The input to this function controls whether all the phones are distinct or some of them share decision-tree roots, and whether all the states within a particular phone share the same decision-tree root. The function SplitDecisionTree() takes as input an EventMap that would normally correspond to a non-split "stub" decision tree of the type created by GetStubMap(), and does the decision-tree splitting until either the maximum number of leaves has been reached, or the gain from splitting a leaf is less than a specified threshold. The function ClusterEventMap() takes an EventMap and a threshold, and merges the leaves of the EventMap as long as the cost of doing so is less than the threshold. This function would normally be called after SplitDecisionTree(). There are other versions of this function that operate with restrictions (e.g. to avoid merging leaves from distinct phones). \subsection treei_func_top Top-level tree-building functions The top-level tree-building functions are listed \ref tree_group_top "here". These functions are called directly from the command-line programs. The most important one is BuildTree(). It is given a Questions configuration class, and some information about sets of phones whose roots to share and (for each set of phones) whether to share the \ref pdf_class "pdf-classes" in a single decision tree root or to have a separate root for each pdf-class. It is also passed information about the lengths of the phones, plus various thresholds. It builds the tree and returns the EventMap object which is used to construct the ContextDependency object. Another important routine in this group is AutomaticallyObtainQuestions(), which used to obtain questions via automatic clustering of phones. It clusters the phones into a tree, and for each node in the tree, all leaves accessible from that node form one question (questions are equivalent to sets of phones). This (called from cluster-phones.cc) makes our recipes independent of human-generated phone sets. The function AccumulateTreeStats() accumulates statistics for training the tree, given features and an alignment that is a sequence of transition-ids (see \ref transition_model_identifiers). This function is defined in a different directory from the other tree-building-related functions (hmm/ not tree/) as it depends on more code (e.g. it knows about the the TransitionModel class), and we preferred to keep the dependencies of the core tree-building code to a minimum. */ /** \defgroup tree_group Modules involved in tree building See \page tree_internals for documentation. */ } |