README.txt 1.87 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41


This directory (lre07) contains example recipes for the 2007 NIST
 Language Evaluation.  The subdirectory v1 demonstrates the standard
 LID system, which is an I-Vector based recipe using full covariance
 GMM-UBM and logistic regression model. The subdirectory v2 demonstrates
 the LID system using a time delay deep neural network based UBM
 which is used to replace the GMM-UBM of v1. The DNN is trained using
 about 1800 hours of the English portion of Fisher.


 The following LDC corpora are used during training:
    
   SRE 2008 training set:                LDC2011S05 
   CALLFRIEND Vietnamese:                LDC96S60     
   CALLFRIEND Tamil:                     LDC96S59
   CALLFRIEND Japanese:                  LDC96S53
   CALLFRIEND Hindi:                     LDC96S52
   CALLFRIEND German:                    LDC96S51
   CALLFRIEND Farsi:                     LDC96S50
   CALLFRIEND French:                    LDC96S48
   CALLFRIEND Standard Arabic:           LDC96S49
   CALLFRIEND Korean:                    LDC96S54
   CALLFRIEND Mainland Chinese Mandarin: LDC96S55
   CALLFRIEND Taiwan Chinese Mandarin:   LDC96S56
   CALLFRIEND Caribbean Spanish:         LDC96S57
   CALLFRIEND Non-Caribbean Spanish:     LDC96S58
   LRE 1996:                             LDC2006S31
   LRE 2003:                             LDC2006S31
   LRE 2005:                             LDC2008S05
   LRE 2007 Training Set:                LDC2009S05
   LRE 2009:                             LDC2014S06
 
 Note that some of the corpora, e.g., SRE 2008 and the LREs used for 
 training contain multiple languages.  Because of this, it isn't
 necessarily vital that all of the corpora are present in your system. 
 
 The NIST 2007 Language Evaluation (LDC2009S04) is used for testing. 
 
 This list will be updated as scripts for system development and testing
 (which will require additional data sources) are created.