Download zip Select Archive Format
Name Last Update history
File empty ..
File dir s5 Loading commit data...
File txt README.txt Loading commit data...

README.txt

About the Wall Street Journal corpus:
    This is a corpus of read
    sentences from the Wall Street Journal, recorded under clean conditions.
    The vocabulary is quite large.   About 80 hours of training data.
    Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
    or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
    The latter option is cheaper and includes only the Sennheiser
    microphone data (which is all we use in the example scripts).

Each subdirectory of this directory contains the
scripts for a sequence of experiments.  [note: most of the older
example scripts have been deleted, but are still available at
^/branches/complete].

  s5: This is the current recommended recipe.