Blame view
egs/lre07/README.txt
1.87 KB
8dcb6dfcb first commit |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
This directory (lre07) contains example recipes for the 2007 NIST Language Evaluation. The subdirectory v1 demonstrates the standard LID system, which is an I-Vector based recipe using full covariance GMM-UBM and logistic regression model. The subdirectory v2 demonstrates the LID system using a time delay deep neural network based UBM which is used to replace the GMM-UBM of v1. The DNN is trained using about 1800 hours of the English portion of Fisher. The following LDC corpora are used during training: SRE 2008 training set: LDC2011S05 CALLFRIEND Vietnamese: LDC96S60 CALLFRIEND Tamil: LDC96S59 CALLFRIEND Japanese: LDC96S53 CALLFRIEND Hindi: LDC96S52 CALLFRIEND German: LDC96S51 CALLFRIEND Farsi: LDC96S50 CALLFRIEND French: LDC96S48 CALLFRIEND Standard Arabic: LDC96S49 CALLFRIEND Korean: LDC96S54 CALLFRIEND Mainland Chinese Mandarin: LDC96S55 CALLFRIEND Taiwan Chinese Mandarin: LDC96S56 CALLFRIEND Caribbean Spanish: LDC96S57 CALLFRIEND Non-Caribbean Spanish: LDC96S58 LRE 1996: LDC2006S31 LRE 2003: LDC2006S31 LRE 2005: LDC2008S05 LRE 2007 Training Set: LDC2009S05 LRE 2009: LDC2014S06 Note that some of the corpora, e.g., SRE 2008 and the LREs used for training contain multiple languages. Because of this, it isn't necessarily vital that all of the corpora are present in your system. The NIST 2007 Language Evaluation (LDC2009S04) is used for testing. This list will be updated as scripts for system development and testing (which will require additional data sources) are created. |