README.txt 1.17 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29


About the Switchboard corpus

    This is conversational telephone speech collected as 2-channel, 8kHz-sampled
    data.  We are using just the Switchboard-1 Phase 1 training data.
    The catalog number LDC97S62 (Switchboard-1 Release 2) corresponds, we believe,
    to what we have.  We also use the Mississippi State transcriptions, which
    we download separately from
    http://www.isip.piconepress.com/projects/switchboard/releases/switchboard_word_alignments.tar.gz

    We are using the eval2000 a.k.a. hub5'00 evaluation data.  The acoustics are
    LDC2002S09 and the text is LDC2002T43.

    We are also using the RT'03 test set, available as LDC2007S10.  Note: not
    all parts of the recipe test with this.

About the Fisher corpus for language modeling

  We use Fisher English training speech transcripts for language modeling, if
  they are available. The catalog number for part 1 transcripts is LDC2004T19,
  and LDC2005T19 for part 2.

Each subdirectory of this directory contains the
scripts for a sequence of experiments.

  s5: This is slightly out of date, please see s5c

  s5b: This is (somewhat less) out of date, please see s5c

  s5c: This is the current recipe.