Blame view
egs/sitw/v2/README.txt
1.06 KB
8dcb6dfcb first commit |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
This recipe replaces i-vectors used in the v1 recipe with embeddings extracted from a deep neural network. In the scripts, we refer to these embeddings as "x-vectors." The recipe in local/nnet3/xvector/tuning/run_xvector_1a.sh is closesly based on the following paper: @inproceedings{snyder2018xvector, title={X-vectors: Robust DNN Embeddings for Speaker Recognition}, author={Snyder, D. and Garcia-Romero, D. and Sell, G. and Povey, D. and Khudanpur, S.}, booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2018}, organization={IEEE}, url={http://www.danielpovey.com/files/2018_icassp_xvectors.pdf} } The recipe uses the following datasets: Evaluation Speakers in the Wild http://www.speech.sri.com/projects/sitw System Development VoxCeleb 1 http://www.robots.ox.ac.uk/~vgg/data/voxceleb VoxCeleb 2 http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 MUSAN http://www.openslr.org/17 RIR_NOISES http://www.openslr.org/28 |