# CHiME-4 2ch track results # The result is based on Hori et al, "The MERL/SRI system for the 3rd CHiME challenge using beamforming, # robust feature extraction, and advanced speech recognition," in Proc. ASRU'15, # and please refer the paper if you think the baseline useful. # Note that the following result is different from that in the paper since we don't include # SRI's robust features and system combination GMM noisy multi-condition with beamformit exp/tri3b_tr05_multi_noisy/best_wer_beamformit_2mics.result ------------------- best overall dt05 WER 17.69% (language model weight = 11) ------------------- dt05_simu WER: 19.15% (Average), 16.14% (BUS), 23.55% (CAFE), 15.49% (PEDESTRIAN), 21.42% (STREET) ------------------- dt05_real WER: 16.22% (Average), 20.12% (BUS), 16.25% (CAFE), 12.35% (PEDESTRIAN), 16.18% (STREET) ------------------- et05_simu WER: 27.57% (Average), 20.17% (BUS), 31.81% (CAFE), 29.96% (PEDESTRIAN), 28.35% (STREET) ------------------- et05_real WER: 29.03% (Average), 39.37% (BUS), 28.43% (CAFE), 27.56% (PEDESTRIAN), 20.77% (STREET) ------------------- GMM noisy multi-condition with beamformit using 6 channel data exp/tri3b_tr05_multi_noisy/best_wer_beamformit_2mics.result ------------------- best overall dt05 WER 17.26% (language model weight = 10) ------------------- dt05_simu WER: 18.35% (Average), 15.44% (BUS), 22.51% (CAFE), 15.24% (PEDESTRIAN), 20.21% (STREET) ------------------- dt05_real WER: 16.17% (Average), 19.12% (BUS), 16.74% (CAFE), 12.27% (PEDESTRIAN), 16.55% (STREET) ------------------- et05_simu WER: 26.85% (Average), 20.08% (BUS), 30.84% (CAFE), 29.03% (PEDESTRIAN), 27.47% (STREET) ------------------- et05_real WER: 27.91% (Average), 37.05% (BUS), 29.25% (CAFE), 25.37% (PEDESTRIAN), 19.97% (STREET) ------------------- GMM noisy multi-condition with BLSTM masking using 6 channel data plus enhanced data exp/tri3b_tr05_multi_noisy/best_wer_blstm_gev.result ------------------- best overall dt05 WER 14.57% (language model weight = 10) ------------------- dt05_simu WER: 15.62% (Average), 12.89% (BUS), 20.49% (CAFE), 14.22% (PEDESTRIAN), 14.90% (STREET) ------------------- dt05_real WER: 13.52% (Average), 15.52% (BUS), 14.34% (CAFE), 11.57% (PEDESTRIAN), 12.67% (STREET) ------------------- et05_simu WER: 19.05% (Average), 14.51% (BUS), 21.87% (CAFE), 20.41% (PEDESTRIAN), 19.39% (STREET) ------------------- et05_real WER: 20.94% (Average), 26.66% (BUS), 21.52% (CAFE), 19.15% (PEDESTRIAN), 16.45% (STREET) ------------------- DNN sMBR exp/tri4a_dnn_tr05_multi_noisy_smbr_i1lats/best_wer_beamformit_2mics.result ------------------- best overall dt05 WER 11.63% (language model weight = 11) (Number of iterations = 4) ------------------- dt05_simu WER: 12.36% (Average), 10.66% (BUS), 15.55% (CAFE), 9.87% (PEDESTRIAN), 13.36% (STREET) ------------------- dt05_real WER: 10.90% (Average), 13.62% (BUS), 10.63% (CAFE), 7.69% (PEDESTRIAN), 11.65% (STREET) ------------------- et05_simu WER: 19.04% (Average), 14.76% (BUS), 21.72% (CAFE), 19.22% (PEDESTRIAN), 20.45% (STREET) ------------------- et05_real WER: 20.44% (Average), 30.02% (BUS), 19.95% (CAFE), 17.79% (PEDESTRIAN), 14.01% (STREET) ------------------- DNN sMBR using all 6 channel data ------------------- best overall dt05 WER 10.13% (language model weight = 12) (Number of iterations = 3) ------------------- dt05_simu WER: 10.69% (Average), 9.19% (BUS), 13.79% (CAFE), 8.51% (PEDESTRIAN), 11.27% (STREET) ------------------- dt05_real WER: 9.58% (Average), 11.58% (BUS), 10.44% (CAFE), 6.61% (PEDESTRIAN), 9.69% (STREET) ------------------- et05_simu WER: 16.20% (Average), 12.36% (BUS), 18.81% (CAFE), 16.14% (PEDESTRIAN), 17.50% (STREET) ------------------- et05_real WER: 16.72% (Average), 23.56% (BUS), 16.72% (CAFE), 14.91% (PEDESTRIAN), 11.71% (STREET) ------------------- 5-gram rescoring exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/best_wer_beamformit_2mics_5gkn_5k.result ------------------- best overall dt05 WER 10.17% (language model weight = 11) ------------------- dt05_simu WER: 10.72% (Average), 9.37% (BUS), 13.70% (CAFE), 8.07% (PEDESTRIAN), 11.73% (STREET) ------------------- dt05_real WER: 9.63% (Average), 11.93% (BUS), 9.75% (CAFE), 6.46% (PEDESTRIAN), 10.37% (STREET) ------------------- et05_simu WER: 16.88% (Average), 12.08% (BUS), 19.70% (CAFE), 16.77% (PEDESTRIAN), 18.94% (STREET) ------------------- et05_real WER: 18.07% (Average), 26.77% (BUS), 17.93% (CAFE), 14.76% (PEDESTRIAN), 12.83% (STREET) ------------------- 5-gram rescoring using all 6 channel data ------------------- best overall dt05 WER 8.53% (language model weight = 13) ------------------- dt05_simu WER: 9.03% (Average), 7.79% (BUS), 11.62% (CAFE), 7.06% (PEDESTRIAN), 9.66% (STREET) ------------------- dt05_real WER: 8.02% (Average), 9.51% (BUS), 8.50% (CAFE), 5.83% (PEDESTRIAN), 8.26% (STREET) ------------------- et05_simu WER: 13.86% (Average), 9.97% (BUS), 16.29% (CAFE), 13.58% (PEDESTRIAN), 15.60% (STREET) ------------------- et05_real WER: 14.66% (Average), 21.20% (BUS), 14.48% (CAFE), 12.57% (PEDESTRIAN), 10.38% (STREET) ------------------- RNNLM exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/best_wer_beamformit_2mics_rnnlm_5k_h300_w0.5_n100.result ------------------- best overall dt05 WER 8.86% (language model weight = 12) ------------------- dt05_simu WER: 9.50% (Average), 8.19% (BUS), 12.15% (CAFE), 7.12% (PEDESTRIAN), 10.55% (STREET) ------------------- dt05_real WER: 8.23% (Average), 10.90% (BUS), 7.96% (CAFE), 5.22% (PEDESTRIAN), 8.82% (STREET) ------------------- et05_simu WER: 15.33% (Average), 10.66% (BUS), 18.21% (CAFE), 15.61% (PEDESTRIAN), 16.85% (STREET) ------------------- et05_real WER: 16.58% (Average), 25.37% (BUS), 15.97% (CAFE), 13.53% (PEDESTRIAN), 11.45% (STREET) ------------------- RNNLM using all 6 channel data ------------------- best overall dt05 WER 7.46% (language model weight = 14) ------------------- dt05_simu WER: 8.06% (Average), 6.73% (BUS), 10.55% (CAFE), 6.22% (PEDESTRIAN), 8.73% (STREET) ------------------- dt05_real WER: 6.87% (Average), 8.41% (BUS), 7.17% (CAFE), 4.85% (PEDESTRIAN), 7.03% (STREET) ------------------- et05_simu WER: 12.57% (Average), 8.85% (BUS), 14.85% (CAFE), 12.44% (PEDESTRIAN), 14.14% (STREET) ------------------- et05_real WER: 13.33% (Average), 18.94% (BUS), 13.04% (CAFE), 11.85% (PEDESTRIAN), 9.49% (STREET) ------------------- TDNN using all 6 channel data exp/chain/tdnn1d_sp/best_wer_beamformit_5mics.result ------------------- best overall dt05 WER 7.89% (language model weight = 10) ------------------- dt05_simu WER: 8.23% (Average), 7.43% (BUS), 9.71% (CAFE), 6.64% (PEDESTRIAN), 9.16% (STREET) ------------------- dt05_real WER: 7.55% (Average), 10.15% (BUS), 6.83% (CAFE), 5.28% (PEDESTRIAN), 7.93% (STREET) ------------------- et05_simu WER: 13.15% (Average), 9.77% (BUS), 14.16% (CAFE), 13.43% (PEDESTRIAN), 15.24% (STREET) ------------------- et05_real WER: 13.39% (Average), 19.63% (BUS), 11.64% (CAFE), 11.49% (PEDESTRIAN), 10.80% (STREET) ------------------- TDNN+RNNLM using all 6 channel data exp/chain/tdnn1d_sp_smbr_lmrescore/best_wer_beamformit_2mics_rnnlm_5k_h300_w0.5_n100.result ------------------- best overall dt05 WER 5.82% (language model weight = 11) ------------------- dt05_simu WER: 6.08% (Average), 5.46% (BUS), 7.42% (CAFE), 4.94% (PEDESTRIAN), 6.49% (STREET) ------------------- dt05_real WER: 5.57% (Average), 8.05% (BUS), 4.72% (CAFE), 3.72% (PEDESTRIAN), 5.80% (STREET) ------------------- et05_simu WER: 9.90% (Average), 7.00% (BUS), 11.15% (CAFE), 10.05% (PEDESTRIAN), 11.41% (STREET) ------------------- et05_real WER: 10.53% (Average), 16.90% (BUS), 8.65% (CAFE), 8.52% (PEDESTRIAN), 8.05% (STREET) ------------------- TDNN using 6 channel data plus enhanced data exp/chain/tdnn1a_sp/best_wer_beamformit_5mics.result ------------------- best overall dt05 WER 7.57% (language model weight = 10) ------------------- dt05_simu WER: 8.18% (Average), 7.12% (BUS), 10.16% (CAFE), 6.33% (PEDESTRIAN), 9.12% (STREET) ------------------- dt05_real WER: 6.96% (Average), 9.38% (BUS), 6.46% (CAFE), 4.91% (PEDESTRIAN), 7.09% (STREET) ------------------- et05_simu WER: 13.14% (Average), 9.92% (BUS), 14.55% (CAFE), 13.26% (PEDESTRIAN), 14.83% (STREET) ------------------- et05_real WER: 12.81% (Average), 19.27% (BUS), 10.66% (CAFE), 11.29% (PEDESTRIAN), 10.03% (STREET) ------------------- TDNN+RNNLM using 6 channel data plus enhanced data exp/chain/tdnn1a_sp_smbr_lmrescore/best_wer_beamformit_2mics_rnnlm_5k_h300_w0.5_n100.result ------------------- best overall dt05 WER 5.52% (language model weight = 10) ------------------- dt05_simu WER: 6.02% (Average), 5.28% (BUS), 7.37% (CAFE), 4.60% (PEDESTRIAN), 6.81% (STREET) ------------------- dt05_real WER: 5.03% (Average), 7.23% (BUS), 4.26% (CAFE), 3.26% (PEDESTRIAN), 5.35% (STREET) ------------------- et05_simu WER: 10.35% (Average), 7.84% (BUS), 11.04% (CAFE), 10.55% (PEDESTRIAN), 11.95% (STREET) ------------------- et05_real WER: 10.20% (Average), 16.21% (BUS), 8.18% (CAFE), 8.43% (PEDESTRIAN), 7.98% (STREET) ------------------- TDNN with BLSTM masking using 6 channel data plus enhanced data exp/chain/tdnn1a_sp/best_wer_blstm_gev.result ------------------- best overall dt05 WER 6.35% (language model weight = 9) ------------------- dt05_simu WER: 7.03% (Average), 5.72% (BUS), 9.32% (CAFE), 6.28% (PEDESTRIAN), 6.78% (STREET) ------------------- dt05_real WER: 5.66% (Average), 6.89% (BUS), 5.99% (CAFE), 4.44% (PEDESTRIAN), 5.34% (STREET) ------------------- et05_simu WER: 8.80% (Average), 6.80% (BUS), 10.20% (CAFE), 8.37% (PEDESTRIAN), 9.84% (STREET) ------------------- et05_real WER: 9.46% (Average), 13.42% (BUS), 8.31% (CAFE), 8.76% (PEDESTRIAN), 7.34% (STREET) ------------------- TDNN+RNNLM with BLSTM masking using 6 channel data plus enhanced data exp/chain/tdnn1a_sp_smbr_lmrescore/best_wer_blstm_gev_rnnlm_5k_h300_w0.5_n100.result ------------------- best overall dt05 WER 4.41% (language model weight = 11) ------------------- dt05_simu WER: 5.03% (Average), 4.13% (BUS), 6.83% (CAFE), 4.45% (PEDESTRIAN), 4.72% (STREET) ------------------- dt05_real WER: 3.79% (Average), 4.68% (BUS), 3.94% (CAFE), 2.95% (PEDESTRIAN), 3.61% (STREET) ------------------- et05_simu WER: 6.07% (Average), 4.52% (BUS), 6.93% (CAFE), 6.05% (PEDESTRIAN), 6.78% (STREET) ------------------- et05_real WER: 6.93% (Average), 10.23% (BUS), 6.13% (CAFE), 6.41% (PEDESTRIAN), 4.97% (STREET) ------------------- TDNN+RNNLM with BLSTM masking using 6 channel data plus enhanced data exp/chain/tdnn1a_sp_smbr_lmrescore/best_wer_blstm_gev_rnnlm_lstm_1a_w0.5_n100.result ------------------- best overall dt05 WER 3.39% (language model weight = 10) ------------------- dt05_simu WER: 3.94% (Average), 2.99% (BUS), 5.65% (CAFE), 3.44% (PEDESTRIAN), 3.67% (STREET) ------------------- dt05_real WER: 2.85% (Average), 3.58% (BUS), 2.89% (CAFE), 2.07% (PEDESTRIAN), 2.85% (STREET) ------------------- et05_simu WER: 5.03% (Average), 3.66% (BUS), 5.57% (CAFE), 4.87% (PEDESTRIAN), 6.03% (STREET) ------------------- et05_real WER: 5.40% (Average), 7.81% (BUS), 4.71% (CAFE), 4.73% (PEDESTRIAN), 4.37% (STREET) -------------------