Blame view
egs/gale_arabic/s5/conf/online_pitch.conf
3.01 KB
8dcb6dfcb first commit |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
## This config is given by conf/make_pitch_online.sh to the program compute-and-process-kaldi-pitch-feats, ## and is copied by steps/online/nnet2/prepare_online_decoding.sh and similar scripts, to be given ## to programs like online2-wav-nnet2-latgen-faster. ## The program compute-and-process-kaldi-pitch-feats will use it to compute pitch features that ## are the same as that those which will generated in online decoding; this enables us to train ## in a way that's compatible with online decoding. ## ## most of these options relate to the post-processing rather than the pitch ## extraction itself. --sample-frequency=16000 --add-raw-log-pitch=true ## this is intended for input to neural nets, so our ## approach is "throw everything in and see what ## sticks". --normalization-left-context=75 --normalization-right-context=75 # We're removing amost all the right-context # for the normalization. The reason why we # include a small nonzero right-context (of # just 0.1 second) is that by adding a little # latency to the computation, it enables us to # get a more accurate estimate of the pitch on # the frame we're currently computing the # normalized pitch of. We know for the current # frame that we will have at least 10 frames to # the right, and those extra 10 frames will # increase the quality of the Viterbi # backtrace. # # Note: our changes to the (left,right) context # from the defaults of (75,75) to (100,10) will # almost certainly worsen results, but will # reduce latency. --frames-per-chunk=10 ## relates to offline simulation of online decoding; 1 ## would be equivalent to getting in samples one by ## one. --simulate-first-pass-online=true ## this make the online-pitch-extraction code ## output the 'first-pass' features, which ## are less accurate than the final ones, and ## which are the only features the neural-net ## decoding would ever see (since we can't ## afford to do lattice rescoring in the ## neural-net code --delay=0 ## We delay all the pitch information by 5 frames. This is almost ## certainly not helpful, but it helps to reduce the overall latency ## added by the pitch computation, from 10 (given by ## --normalization-right-context) to 10 - 5 = 5. |