Blame view

egs/lre07/README.txt 1.87 KB
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
   This directory (lre07) contains example recipes for the 2007 NIST
   Language Evaluation.  The subdirectory v1 demonstrates the standard
   LID system, which is an I-Vector based recipe using full covariance
   GMM-UBM and logistic regression model. The subdirectory v2 demonstrates
   the LID system using a time delay deep neural network based UBM
   which is used to replace the GMM-UBM of v1. The DNN is trained using
   about 1800 hours of the English portion of Fisher.
  
  
   The following LDC corpora are used during training:
      
     SRE 2008 training set:                LDC2011S05 
     CALLFRIEND Vietnamese:                LDC96S60     
     CALLFRIEND Tamil:                     LDC96S59
     CALLFRIEND Japanese:                  LDC96S53
     CALLFRIEND Hindi:                     LDC96S52
     CALLFRIEND German:                    LDC96S51
     CALLFRIEND Farsi:                     LDC96S50
     CALLFRIEND French:                    LDC96S48
     CALLFRIEND Standard Arabic:           LDC96S49
     CALLFRIEND Korean:                    LDC96S54
     CALLFRIEND Mainland Chinese Mandarin: LDC96S55
     CALLFRIEND Taiwan Chinese Mandarin:   LDC96S56
     CALLFRIEND Caribbean Spanish:         LDC96S57
     CALLFRIEND Non-Caribbean Spanish:     LDC96S58
     LRE 1996:                             LDC2006S31
     LRE 2003:                             LDC2006S31
     LRE 2005:                             LDC2008S05
     LRE 2007 Training Set:                LDC2009S05
     LRE 2009:                             LDC2014S06
   
   Note that some of the corpora, e.g., SRE 2008 and the LREs used for 
   training contain multiple languages.  Because of this, it isn't
   necessarily vital that all of the corpora are present in your system. 
   
   The NIST 2007 Language Evaluation (LDC2009S04) is used for testing. 
   
   This list will be updated as scripts for system development and testing
   (which will require additional data sources) are created.