online_pitch.conf
2.99 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
## This config is given by conf/make_pitch_online.sh to the program compute-and-process-kaldi-pitch-feats,
## and is copied by steps/online/nnet2/prepare_online_decoding.sh and similar scripts, to be given
## to programs like online2-wav-nnet2-latgen-faster.
## The program compute-and-process-kaldi-pitch-feats will use it to compute pitch features that
## are the same as that those which will generated in online decoding; this enables us to train
## in a way that's compatible with online decoding.
##
## most of these options relate to the post-processing rather than the pitch
## extraction itself.
--add-raw-log-pitch=true ## this is intended for input to neural nets, so our
## approach is "throw everything in and see what
## sticks".
--normalization-left-context=100
--normalization-right-context=10 # We're removing amost all the right-context
# for the normalization. The reason why we
# include a small nonzero right-context (of
# just 0.1 second) is that by adding a little
# latency to the computation, it enables us to
# get a more accurate estimate of the pitch on
# the frame we're currently computing the
# normalized pitch of. We know for the current
# frame that we will have at least 10 frames to
# the right, and those extra 10 frames will
# increase the quality of the Viterbi
# backtrace.
#
# Note: our changes to the (left,right) context
# from the defaults of (75,75) to (100,10) will
# almost certainly worsen results, but will
# reduce latency.
--frames-per-chunk=10 ## relates to offline simulation of online decoding; 1
## would be equivalent to getting in samples one by
## one.
--simulate-first-pass-online=true ## this make the online-pitch-extraction code
## output the 'first-pass' features, which
## are less accurate than the final ones, and
## which are the only features the neural-net
## decoding would ever see (since we can't
## afford to do lattice rescoring in the
## neural-net code
--delay=5 ## We delay all the pitch information by 5 frames. This is almost
## certainly not helpful, but it helps to reduce the overall latency
## added by the pitch computation, from 10 (given by
## --normalization-right-context) to 10 - 5 = 5.