RESULTS 11.6 KB
# CHiME-4 1ch track results
# The result is based on Hori et al, "The MERL/SRI system for the 3rd CHiME challenge using beamforming,
# robust feature extraction, and advanced speech recognition," in Proc. ASRU'15,
# and please refer the paper if you think the baseline useful.
# Note that the following result is different from that in the paper since we don't include
# SRI's robust features and system combination

GMM noisy multi-condition without enhancement
exp/tri3b_tr05_multi_noisy/best_wer_isolated_1ch_track.result
-------------------
dt05_simu WER: 24.48% (Average), 20.37% (BUS), 29.78% (CAFE), 20.49% (PEDESTRIAN), 27.27% (STREET)
-------------------
dt05_real WER: 22.16% (Average), 27.32% (BUS), 23.07% (CAFE), 16.29% (PEDESTRIAN), 21.96% (STREET)
-------------------
et05_simu WER: 33.30% (Average), 26.65% (BUS), 38.40% (CAFE), 34.68% (PEDESTRIAN), 33.47% (STREET)
-------------------
et05_real WER: 37.54% (Average), 51.92% (BUS), 39.67% (CAFE), 34.04% (PEDESTRIAN), 24.54% (STREET)
-------------------

GMM noisy multi-condition without enhancement using 6 channel data
exp/tri3b_tr05_multi_noisy/best_wer_isolated_1ch_track.result
-------------------
best overall dt05 WER 22.32% (language model weight = 10)
-------------------
dt05_simu WER: 23.24% (Average), 19.28% (BUS), 28.41% (CAFE), 19.16% (PEDESTRIAN), 26.12% (STREET)
-------------------
dt05_real WER: 21.40% (Average), 25.86% (BUS), 21.81% (CAFE), 16.80% (PEDESTRIAN), 21.12% (STREET)
-------------------
et05_simu WER: 32.03% (Average), 25.42% (BUS), 36.25% (CAFE), 33.34% (PEDESTRIAN), 33.10% (STREET)
-------------------
et05_real WER: 36.14% (Average), 49.28% (BUS), 38.79% (CAFE), 32.44% (PEDESTRIAN), 24.06% (STREET)
-------------------

GMM noisy multi-condition without enhancement using 6 channel data plus enhanced data
exp/tri3b_tr05_multi_noisy/best_wer_isolated_1ch_track.result
-------------------
best overall dt05 WER 22.28% (language model weight = 10)
-------------------
dt05_simu WER: 23.16% (Average), 19.76% (BUS), 28.14% (CAFE), 19.13% (PEDESTRIAN), 25.60% (STREET)
-------------------
dt05_real WER: 21.39% (Average), 25.56% (BUS), 23.01% (CAFE), 16.12% (PEDESTRIAN), 20.88% (STREET)
-------------------
et05_simu WER: 32.18% (Average), 25.33% (BUS), 37.37% (CAFE), 33.36% (PEDESTRIAN), 32.67% (STREET)
-------------------
et05_real WER: 35.54% (Average), 49.07% (BUS), 38.94% (CAFE), 31.60% (PEDESTRIAN), 22.56% (STREET)
-------------------

GMM noisy multi-condition with BLSTM masking using 6 channel data
exp/tri3b_tr05_multi_noisy/best_wer_single_BLSTMmask.result
-------------------
best overall dt05 WER 28.82% (language model weight = 14)
-------------------
dt05_simu WER: 28.54% (Average), 25.46% (BUS), 33.47% (CAFE), 25.19% (PEDESTRIAN), 30.06% (STREET)
-------------------
dt05_real WER: 29.10% (Average), 33.46% (BUS), 31.80% (CAFE), 25.71% (PEDESTRIAN), 25.42% (STREET)
-------------------
et05_simu WER: 36.10% (Average), 30.97% (BUS), 40.42% (CAFE), 35.82% (PEDESTRIAN), 37.19% (STREET)
-------------------
et05_real WER: 41.84% (Average), 52.57% (BUS), 46.41% (CAFE), 39.87% (PEDESTRIAN), 28.52% (STREET)
-------------------

GMM noisy multi-condition with BLSTM masking using 6 channel data plus enhanced data
exp/tri3b_tr05_multi_noisy/best_wer_single_BLSTMmask.result
-------------------
best overall dt05 WER 22.72% (language model weight = 13)
-------------------
dt05_simu WER: 23.37% (Average), 20.71% (BUS), 28.26% (CAFE), 19.85% (PEDESTRIAN), 24.66% (STREET)
-------------------
dt05_real WER: 22.07% (Average), 25.92% (BUS), 24.32% (CAFE), 18.47% (PEDESTRIAN), 19.58% (STREET)
-------------------
et05_simu WER: 30.41% (Average), 24.08% (BUS), 35.86% (CAFE), 30.80% (PEDESTRIAN), 30.89% (STREET)
-------------------
et05_real WER: 34.02% (Average), 44.68% (BUS), 37.19% (CAFE), 31.73% (PEDESTRIAN), 22.49% (STREET)
-------------------

DNN sMBR
exp/tri4a_dnn_tr05_multi_noisy_smbr_i1lats/best_wer_isolated_1ch_track.result
-------------------
best overall dt05 WER 15.17% (language model weight = 11)
 (Number of iterations = 4)
-------------------
dt05_simu WER: 15.67% (Average), 14.09% (BUS), 18.97% (CAFE), 12.76% (PEDESTRIAN), 16.89% (STREET)
-------------------
dt05_real WER: 14.67% (Average), 18.97% (BUS), 15.28% (CAFE), 9.88% (PEDESTRIAN), 14.56% (STREET)
-------------------
et05_simu WER: 24.13% (Average), 19.65% (BUS), 27.57% (CAFE), 23.14% (PEDESTRIAN), 26.17% (STREET)
-------------------
et05_real WER: 27.68% (Average), 40.40% (BUS), 28.95% (CAFE), 24.25% (PEDESTRIAN), 17.13% (STREET)
-------------------

DNN sMBR using all 6 channel data
-------------------
best overall dt05 WER 12.84% (language model weight = 12)
 (Number of iterations = 3)
-------------------
dt05_simu WER: 13.78% (Average), 12.26% (BUS), 17.05% (CAFE), 10.96% (PEDESTRIAN), 14.85% (STREET)
-------------------
dt05_real WER: 11.90% (Average), 15.44% (BUS), 12.77% (CAFE), 8.19% (PEDESTRIAN), 11.19% (STREET)
-------------------
et05_simu WER: 20.73% (Average), 16.08% (BUS), 24.58% (CAFE), 19.97% (PEDESTRIAN), 22.28% (STREET)
-------------------
et05_real WER: 21.91% (Average), 30.49% (BUS), 24.37% (CAFE), 18.85% (PEDESTRIAN), 13.91% (STREET)
-------------------

5-gram rescoring
exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/best_wer_isolated_1ch_track_5gkn_5k.result
-------------------
best overall dt05 WER 13.46% (language model weight = 11)
-------------------
dt05_simu WER: 13.99% (Average), 13.02% (BUS), 16.76% (CAFE), 11.12% (PEDESTRIAN), 15.07% (STREET)
-------------------
dt05_real WER: 12.93% (Average), 16.89% (BUS), 13.48% (CAFE), 8.53% (PEDESTRIAN), 12.82% (STREET)
-------------------
et05_simu WER: 22.32% (Average), 17.82% (BUS), 25.48% (CAFE), 21.70% (PEDESTRIAN), 24.30% (STREET)
-------------------
et05_real WER: 24.92% (Average), 37.52% (BUS), 26.45% (CAFE), 21.28% (PEDESTRIAN), 14.44% (STREET)
-------------------

5-gram rescoring using all 6 channel data
-------------------
best overall dt05 WER 11.07% (language model weight = 12)
-------------------
dt05_simu WER: 11.88% (Average), 11.15% (BUS), 15.04% (CAFE), 8.89% (PEDESTRIAN), 12.43% (STREET)
-------------------
dt05_real WER: 10.26% (Average), 13.60% (BUS), 10.66% (CAFE), 6.79% (PEDESTRIAN), 10.00% (STREET)
-------------------
et05_simu WER: 18.67% (Average), 13.58% (BUS), 22.21% (CAFE), 18.53% (PEDESTRIAN), 20.38% (STREET)
-------------------
et05_real WER: 19.51% (Average), 28.04% (BUS), 21.52% (CAFE), 16.14% (PEDESTRIAN), 12.33% (STREET)
-------------------

RNNLM
exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/best_wer_isolated_1ch_track_rnnlm_5k_h300_w0.5_n100.result
-------------------
best overall dt05 WER 12.28% (language model weight = 11)
-------------------
dt05_simu WER: 12.98% (Average), 11.90% (BUS), 15.90% (CAFE), 9.94% (PEDESTRIAN), 14.19% (STREET)
-------------------
dt05_real WER: 11.57% (Average), 15.13% (BUS), 11.81% (CAFE), 7.42% (PEDESTRIAN), 11.90% (STREET)
-------------------
et05_simu WER: 20.84% (Average), 16.49% (BUS), 23.91% (CAFE), 20.25% (PEDESTRIAN), 22.71% (STREET)
-------------------
et05_real WER: 23.70% (Average), 35.93% (BUS), 24.60% (CAFE), 19.94% (PEDESTRIAN), 14.36% (STREET)
-------------------

RNNLM using all 6 channel data
-------------------
best overall dt05 WER 9.99% (language model weight = 14)
-------------------
dt05_simu WER: 11.02% (Average), 10.19% (BUS), 14.23% (CAFE), 8.20% (PEDESTRIAN), 11.45% (STREET)
-------------------
dt05_real WER: 8.97% (Average), 12.04% (BUS), 9.38% (CAFE), 5.65% (PEDESTRIAN), 8.82% (STREET)
-------------------
et05_simu WER: 17.31% (Average), 12.81% (BUS), 20.32% (CAFE), 17.03% (PEDESTRIAN), 19.09% (STREET)
-------------------
et05_real WER: 18.10% (Average), 26.58% (BUS), 19.97% (CAFE), 14.44% (PEDESTRIAN), 11.43% (STREET)
-------------------

TDNN using all 6 channel data
exp/chain/tdnniso_sp/best_wer_beamformit_5mics.result
-------------------
best overall dt05 WER 9.56% (language model weight = 10)
-------------------
dt05_simu WER: 10.23% (Average), 8.86% (BUS), 13.13% (CAFE), 7.94% (PEDESTRIAN), 11.00% (STREET)
-------------------
dt05_real WER: 8.89% (Average), 11.90% (BUS), 8.54% (CAFE), 6.09% (PEDESTRIAN), 9.03% (STREET)
-------------------
et05_simu WER: 16.48% (Average), 12.87% (BUS), 18.60% (CAFE), 15.52% (PEDESTRIAN), 18.94% (STREET)
-------------------
et05_real WER: 16.34% (Average), 24.32% (BUS), 16.51% (CAFE), 13.43% (PEDESTRIAN), 11.11% (STREET)
-------------------

TDNN+RNNLM using all 6 channel data
exp/chain/tdnniso_sp_smbr_lmrescore/best_wer_beamformit_5mics_rnnlm_5k_h300_w0.5_n100.result
-------------------
best overall dt05 WER 7.21% (language model weight = 11)
-------------------
dt05_simu WER: 7.78% (Average), 6.52% (BUS), 10.27% (CAFE), 5.69% (PEDESTRIAN), 8.66% (STREET)
-------------------
dt05_real WER: 6.64% (Average), 9.06% (BUS), 6.62% (CAFE), 4.26% (PEDESTRIAN), 6.61% (STREET)
-------------------
et05_simu WER: 13.54% (Average), 10.22% (BUS), 15.07% (CAFE), 12.94% (PEDESTRIAN), 15.93% (STREET)
-------------------
et05_real WER: 12.92% (Average), 20.79% (BUS), 12.35% (CAFE), 9.62% (PEDESTRIAN), 8.91% (STREET)
-------------------

TDNN with BLSTM masking using all 6 channel data
exp/chain/tdnn1a_sp/best_wer_single_BLSTMmask.result
-------------------
best overall dt05 WER 18.00% (language model weight = 13)
-------------------
dt05_simu WER: 18.81% (Average), 15.34% (BUS), 23.58% (CAFE), 15.27% (PEDESTRIAN), 21.06% (STREET)
-------------------
dt05_real WER: 17.18% (Average), 21.12% (BUS), 19.45% (CAFE), 11.61% (PEDESTRIAN), 16.53% (STREET)
-------------------
et05_simu WER: 25.85% (Average), 20.06% (BUS), 30.13% (CAFE), 26.88% (PEDESTRIAN), 26.32% (STREET)
-------------------
et05_real WER: 27.68% (Average), 37.88% (BUS), 29.51% (CAFE), 24.74% (PEDESTRIAN), 18.60% (STREET)
-------------------

TDNN+RNNLM with BLSTM masking using all 6 channel data
exp/chain/tdnn1a_sp/best_wer_single_BLSTMmask.result
-------------------
best overall dt05 WER 14.38% (language model weight = 14)
-------------------
dt05_simu WER: 15.62% (Average), 12.36% (BUS), 20.46% (CAFE), 12.11% (PEDESTRIAN), 17.55% (STREET)
-------------------
dt05_real WER: 13.15% (Average), 16.43% (BUS), 15.21% (CAFE), 8.59% (PEDESTRIAN), 12.37% (STREET)
-------------------
et05_simu WER: 21.61% (Average), 16.01% (BUS), 25.87% (CAFE), 22.15% (PEDESTRIAN), 22.39% (STREET)
-------------------
et05_real WER: 22.47% (Average), 32.34% (BUS), 24.08% (CAFE), 18.91% (PEDESTRIAN), 14.57% (STREET)
-------------------

TDNN with BLSTM masking using all 6 channel data plus enhanced data
exp/chain/tdnn1a_sp/best_wer_single_BLSTMmask.result
-------------------
best overall dt05 WER 11.73% (language model weight = 12)
-------------------
dt05_simu WER: 13.06% (Average), 10.78% (BUS), 17.20% (CAFE), 10.15% (PEDESTRIAN), 14.10% (STREET)
-------------------
dt05_real WER: 10.40% (Average), 13.44% (BUS), 10.72% (CAFE), 7.29% (PEDESTRIAN), 10.16% (STREET)
-------------------
et05_simu WER: 19.48% (Average), 14.48% (BUS), 23.10% (CAFE), 19.84% (PEDESTRIAN), 20.49% (STREET)
-------------------
et05_real WER: 19.08% (Average), 27.43% (BUS), 19.76% (CAFE), 16.93% (PEDESTRIAN), 12.22% (STREET)
-------------------

TDNN+RNNLM with BLSTM masking using all 6 channel data plus enhanced data
exp/chain/tdnn1a_sp/best_wer_single_BLSTMmask.result
-------------------
best overall dt05 WER 8.95% (language model weight = 13)
-------------------
dt05_simu WER: 10.28% (Average), 8.51% (BUS), 13.88% (CAFE), 7.58% (PEDESTRIAN), 11.17% (STREET)
-------------------
dt05_real WER: 7.62% (Average), 10.25% (BUS), 7.86% (CAFE), 5.31% (PEDESTRIAN), 7.05% (STREET)
-------------------
et05_simu WER: 16.18% (Average), 12.03% (BUS), 18.71% (CAFE), 16.62% (PEDESTRIAN), 17.35% (STREET)
-------------------
et05_real WER: 15.08% (Average), 22.96% (BUS), 15.45% (CAFE), 12.74% (PEDESTRIAN), 9.17% (STREET)
-------------------