online_pitch.conf 2.99 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45


## This config is given by conf/make_pitch_online.sh to the program compute-and-process-kaldi-pitch-feats,
## and is copied by steps/online/nnet2/prepare_online_decoding.sh and similar scripts, to be given
## to programs like online2-wav-nnet2-latgen-faster.
## The program compute-and-process-kaldi-pitch-feats will use it to compute pitch features that
## are the same as that those which will generated in online decoding; this enables us to train
## in a way that's compatible with online decoding.
## 

## most of these options relate to the post-processing rather than the pitch
## extraction itself.
--add-raw-log-pitch=true   ## this is intended for input to neural nets, so our
                           ## approach is "throw everything in and see what
                           ## sticks".
--normalization-left-context=100
--normalization-right-context=10 # We're removing amost all the right-context
                                 # for the normalization.  The reason why we
                                 # include a small nonzero right-context (of
                                 # just 0.1 second) is that by adding a little
                                 # latency to the computation, it enables us to
                                 # get a more accurate estimate of the pitch on
                                 # the frame we're currently computing the
                                 # normalized pitch of.  We know for the current
                                 # frame that we will have at least 10 frames to
                                 # the right, and those extra 10 frames will
                                 # increase the quality of the Viterbi
                                 # backtrace.
                                 #
                                 # Note: our changes to the (left,right) context
                                 # from the defaults of (75,75) to (100,10) will
                                 # almost certainly worsen results, but will
                                 # reduce latency.
--frames-per-chunk=10    ## relates to offline simulation of online decoding; 1
                         ## would be equivalent to getting in samples one by
                         ## one.
--simulate-first-pass-online=true  ## this make the online-pitch-extraction code
                                   ## output the 'first-pass' features, which
                                   ## are less accurate than the final ones, and
                                   ## which are the only features the neural-net
                                   ## decoding would ever see (since we can't
                                   ## afford to do lattice rescoring in the
                                   ## neural-net code
--delay=5  ## We delay all the pitch information by 5 frames.  This is almost
           ## certainly not helpful, but it helps to reduce the overall latency
           ## added by the pitch computation, from 10 (given by
           ## --normalization-right-context) to 10 - 5 = 5.