Blame view

egs/wsj/s5/conf/online_pitch.conf 2.99 KB
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
  ## This config is given by conf/make_pitch_online.sh to the program compute-and-process-kaldi-pitch-feats,
  ## and is copied by steps/online/nnet2/prepare_online_decoding.sh and similar scripts, to be given
  ## to programs like online2-wav-nnet2-latgen-faster.
  ## The program compute-and-process-kaldi-pitch-feats will use it to compute pitch features that
  ## are the same as that those which will generated in online decoding; this enables us to train
  ## in a way that's compatible with online decoding.
  ## 
  
  ## most of these options relate to the post-processing rather than the pitch
  ## extraction itself.
  --add-raw-log-pitch=true   ## this is intended for input to neural nets, so our
                             ## approach is "throw everything in and see what
                             ## sticks".
  --normalization-left-context=100
  --normalization-right-context=10 # We're removing amost all the right-context
                                   # for the normalization.  The reason why we
                                   # include a small nonzero right-context (of
                                   # just 0.1 second) is that by adding a little
                                   # latency to the computation, it enables us to
                                   # get a more accurate estimate of the pitch on
                                   # the frame we're currently computing the
                                   # normalized pitch of.  We know for the current
                                   # frame that we will have at least 10 frames to
                                   # the right, and those extra 10 frames will
                                   # increase the quality of the Viterbi
                                   # backtrace.
                                   #
                                   # Note: our changes to the (left,right) context
                                   # from the defaults of (75,75) to (100,10) will
                                   # almost certainly worsen results, but will
                                   # reduce latency.
  --frames-per-chunk=10    ## relates to offline simulation of online decoding; 1
                           ## would be equivalent to getting in samples one by
                           ## one.
  --simulate-first-pass-online=true  ## this make the online-pitch-extraction code
                                     ## output the 'first-pass' features, which
                                     ## are less accurate than the final ones, and
                                     ## which are the only features the neural-net
                                     ## decoding would ever see (since we can't
                                     ## afford to do lattice rescoring in the
                                     ## neural-net code
  --delay=5  ## We delay all the pitch information by 5 frames.  This is almost
             ## certainly not helpful, but it helps to reduce the overall latency
             ## added by the pitch computation, from 10 (given by
             ## --normalization-right-context) to 10 - 5 = 5.