README.txt 1.06 KB
edit raw blame history



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26


This recipe replaces i-vectors used in the v1 recipe with embeddings extracted
 from a deep neural network.  In the scripts, we refer to these embeddings as
 "x-vectors."  The recipe in local/nnet3/xvector/tuning/run_xvector_1a.sh is
 closesly based on the following paper:

 @inproceedings{snyder2018xvector,
 title={X-vectors: Robust DNN Embeddings for Speaker Recognition},
 author={Snyder, D. and Garcia-Romero, D. and Sell, G. and Povey, D. and Khudanpur, S.},
 booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
 year={2018},
 organization={IEEE},
 url={http://www.danielpovey.com/files/2018_icassp_xvectors.pdf}
 }

 The recipe uses the following datasets:

 Evaluation
     
     Speakers in the Wild    http://www.speech.sri.com/projects/sitw

 System Development
     
     VoxCeleb 1              http://www.robots.ox.ac.uk/~vgg/data/voxceleb
     VoxCeleb 2              http://www.robots.ox.ac.uk/~vgg/data/voxceleb2
     MUSAN                   http://www.openslr.org/17
     RIR_NOISES              http://www.openslr.org/28