README.txt 1.45 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30


This recipe replaces the standard unsupervised GMM of the v1 recipe with a 
 UBM that uses a time-delay deep neural network (TDNN).  Posteriors from the
 TDNN are used in conjunction with features extracted using a standard approach
 for speaker recognition, to create the sufficient statistics for i-vector
 extraction.  The recipe also demonstrates a lightweight alternative in which
 a supervised GMM is derived from the TDNN posteriors. The recipe is based on
 http://www.danielpovey.com/files/2015_asru_tdnn_ubm.pdf. See run.sh for 
 updated results.

 The following describes data required for system development (on top of the 
 data for testing described in ../README.txt).  We use SWBD and the older 
 (prior to 2010) SREs to train the supervised-GMM and iVector extractor. To 
 create an in-domain system, the SREs are needed to train the PLDA backend.
 The TDNN is trained on Fisher English.
 
     Corpus              LDC Catalog No.
     SWBD2 Phase 2       LDC99S79
     SWBD2 Phase 3       LDC2002S06
     SWBD Cellular 1     LDC2001S13
     SWBD Ceullar 2      LDC2004S07
     SRE2004             LDC2006S44
     SRE2005 Train       LDC2011S01
     SRE2005 Test        LDC2011S04
     SRE2006 Train       LDC2011S09
     SRE2006 Test 1      LDC2011S10
     SRE2006 Test 2      LDC2012S01
     SRE2008 Train       LDC2011S05
     SRE2008 Test        LDC2011S08
     Fisher speech       LDC2004S13, LDC2005S13 
     Fisher test         LDC2004T19, LDC2005T19