README.txt 755 Bytes
About the Wall Street Journal corpus: This is a corpus of read sentences from the Wall Street Journal, recorded under clean conditions. The vocabulary is quite large. About 80 hours of training data. Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ] or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ] The latter option is cheaper and includes only the Sennheiser microphone data (which is all we use in the example scripts). Each subdirectory of this directory contains the scripts for a sequence of experiments. [note: most of the older example scripts have been deleted, but are still available at ^/branches/complete]. s5: This is the current recommended recipe.