Blame view

egs/vystadial_cz/online_demo/README.txt 3.46 KB
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
  Running the example Pykaldi scripts
  ===================================
  
  Summary
  -------
  The demo presents three new Kaldi features on pretrained Czech AMs:
  * Online Lattice Recogniser. The best results were obtained using MFCC, LDA+MLLT and bMMI.
  * Python wrapper which interfaces the OnlineLatticeRecogniser to Python.
  * Training scripts which can be used with standard Kaldi tools or with the new OnlineLatticeRecogniser.
  
  The pykaldi-latgen-faster-decoder.py
  demonstrates how to use the class PyOnlineLatgenRecogniser,
  which takes audio on the input and outputs the decoded lattice.
  There are also the OnlineLatgenRecogniser C++ and Kaldi standard gmm-latgen-faster demos.
  All three demos produce the same results.
  
  TODO: Publish English AM and add English demo
  
  In March 2014, the PyOnlineLatticeRecogniser recogniser was evaluated on domain of SDS Alex. 
  See graphs evaluating OnlineLatticeRecogniser performance at 
  http://nbviewer.ipython.org/github/oplatek/pykaldi-eval/blob/master/Pykaldi-evaluation.ipynb.
  
  An example posterior word lattice output for one Czech utterance can be seen at 
  http://oplatek.blogspot.it/2014/02/ipython-demo-pykaldi-decoders-on-short.html
  
  Dependencies
  ------------
  * Build (make) and test (make test) the code under  kaldi/src, kaldi/src/pykaldi and kaldi/src/onl-rec
  * For inspecting the saved lattices you need dot binary 
    from Graphviz <http://www.graphviz.org/Download..php library.
  * For running the live demo you need pyaudio package.
  
  Running the example scripts
  ---------------------------
  
  
      make online-latgen-recogniser
  
  * Run the test src/onl-rec/onl-rec-latgen-recogniser-test for OnlineLatgenRecogniser
    which shows C++ example of how to use the recogniser.
    The same data, AM a LM are used as for make pyonline-latgen-recogniser.
    The pretrained Language (LM) and Acoustic (AM) models are used.
    The data as well as the models are downloaded from our server.
  
  
      make pyonline-latgen-recogniser
  
  * Run the decoding with PyOnlineFasterRecogniser. 
    Example Python script pykaldi-online-latgen-recogniser.py shows 
    PyOnlineFasterRecogniser decoding  on small test set.
    The same pretrained Language (LM) and Acoustic (AM) models.
  
  
      make gmm-latgen-faster
  
  * Run the decoding with Kaldi gmm-latgen-faster executable wrapped in `<run_gmm-latgen-faster.sh>`_.
    This is the reference executable for 
    The same data, AM a LM are used as for make pyonline-latgen-recogniser.
    We use this script as reference.
  
  
      make live
  
  * The simple live demo should decode speech from your microphone.
    It uses the pretrained AM and LM and wraps `<live-demo.py>`_. 
    The pyaudio package is used for capturing the sound from your microphone.
    We were able to use it under `Ubuntu 12.10` and Python 2.7, but we guarantee nothing on your system.
  
  
  Notes
  -----
   The scripts for Czech and English support acoustic models obtained using MFCC, LDA+MLLT/delta+delta-delta feature transformations and acoustic models trained generatively or by MPE or bMMI training.
  
  The new functionality is separated to different directories:
   * kaldi/src/onl-rec stores C++ code for OnlineLatticeRecogniser.
   * kaldi/scr/pykaldi stores Python wrapper PyOnlineLatticeRecogniser.
   * kaldi/egs/vystadial/s5 stores training scripts.
   * kaldi/egs/vystadial/online_demo shows Kaldi standard decoder, OnlineLatticeRecogniser and PyOnlineLatticeRecogniser, which produce the exact same lattices using the same setup.
  
  The OnlineLatticeRecogniser is used in Alex dialogue system (https://github.com/UFAL-DSG/alex).