Yannick Estève / ONTRAC-Kaldi

Blame view

egs/sitw/v2/README.txt 1.06 KB
   This recipe replaces i-vectors used in the v1 recipe with embeddings extracted
   from a deep neural network.  In the scripts, we refer to these embeddings as
   "x-vectors."  The recipe in local/nnet3/xvector/tuning/run_xvector_1a.sh is
   closesly based on the following paper:
  
   @inproceedings{snyder2018xvector,
   title={X-vectors: Robust DNN Embeddings for Speaker Recognition},
   author={Snyder, D. and Garcia-Romero, D. and Sell, G. and Povey, D. and Khudanpur, S.},
   booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   year={2018},
   organization={IEEE},
   url={http://www.danielpovey.com/files/2018_icassp_xvectors.pdf}
   }
  
   The recipe uses the following datasets:
  
   Evaluation
       
       Speakers in the Wild    http://www.speech.sri.com/projects/sitw
  
   System Development
       
       VoxCeleb 1              http://www.robots.ox.ac.uk/~vgg/data/voxceleb
       VoxCeleb 2              http://www.robots.ox.ac.uk/~vgg/data/voxceleb2
       MUSAN                   http://www.openslr.org/17
       RIR_NOISES              http://www.openslr.org/28