README.txt 825 Bytes
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23


About the Fisher-English corpus

    This is conversational telephone speech collected as 2-channel, 8kHz-sampled
    data.  The data is similar to Switchboard but the transcription was mostly
    done in a "faster", lower-quality way.

    Fisher comes in two parts, and the text and speech have separate LDC numbers.
    This recipe uses both parts.  The LDC numbers are

    The speech: LDC2004S13, LDC2005S13
    The text: LDC2004T19, LDC2005T19
 

Each subdirectory of this directory contains the
scripts for a sequence of experiments.

  s5: This recipe is being worked on, it has the initial stages of
      training ready.  Note that the data normalization is not compatible
      with our Switchboard setup, we have retained the conventions
      of the Fisher corpus, e.g. lower-case, and acronyms like c._n._n.