Blame view

egs/fisher_callhome_spanish/README.txt 1.73 KB
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
  Kaldi recipe for the Fisher and Callhome Spanish Corpora
  
  About the Fisher Spanish Corpus
    Fisher Spanish - Speech was developed by the Linguistic 
    Data Consortium (LDC) and consists of audio files covering
    roughly 163 hours of telephone speech from 136 native
    Caribbean Spanish and non-Caribbean Spanish speakers.
    Full orthographic transcripts of these audio files are available
    in LDC2010T04
  
    Speech : LDC2010S01
    Transcripts : LDC2010T04
  
  About the Callhome Spanish Corpus
    The CALLHOME Spanish corpus of telephone speech consists
    of 120 unscripted telephone conversations between native speakers of Spanish.
    All calls, which lasted up to 30 minutes, originated in North America
    and were placed to international locations. Most participants called
    family members or close friends.
  
    Speech : LDC96S35
    Transcripts : LDC96T17
  
  The LDC Spanish rule based lexicon
    The CALLHOME Spanish collection includes a lexical component. 
    The CALLHOME Spanish Lexicon consists of 45,582 words and contains
    separate information fields with phonological, morphological and
    frequency information for each word.
  
    Lexicon : LDC96L16
  
  
  Each subdirectory of this directory contains the
  scripts for a sequence of experiments.
  
    s5: This recipe is based on the WSJ s5 recipe. It works with the 
        the transcripts (available along with the script in LDC97T19). In addition, 
        it uses a phonetic lexicon generated using the rules based LDC lexicon. 
        The recipe follows the Triphone+SGMM+SAT+fMLLR+SGMM+DNN pipeline. It uses data
        partitions as specified by LDC in the Callhome corpus description. For Fisher
        custom partitions are available (check the run.sh file for the location 
        of the split file : This can be changed).