Blame view

egs/wsj/README.txt 755 Bytes
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  
  About the Wall Street Journal corpus:
      This is a corpus of read
      sentences from the Wall Street Journal, recorded under clean conditions.
      The vocabulary is quite large.   About 80 hours of training data.
      Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
      or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
      The latter option is cheaper and includes only the Sennheiser
      microphone data (which is all we use in the example scripts).
  
  Each subdirectory of this directory contains the
  scripts for a sequence of experiments.  [note: most of the older
  example scripts have been deleted, but are still available at
  ^/branches/complete].
  
    s5: This is the current recommended recipe.