RESULTS
10.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# CHiME-4 2ch track results
# The result is based on Hori et al, "The MERL/SRI system for the 3rd CHiME challenge using beamforming,
# robust feature extraction, and advanced speech recognition," in Proc. ASRU'15,
# and please refer the paper if you think the baseline useful.
# Note that the following result is different from that in the paper since we don't include
# SRI's robust features and system combination
GMM noisy multi-condition with beamformit
exp/tri3b_tr05_multi_noisy/best_wer_beamformit_2mics.result
-------------------
best overall dt05 WER 17.69% (language model weight = 11)
-------------------
dt05_simu WER: 19.15% (Average), 16.14% (BUS), 23.55% (CAFE), 15.49% (PEDESTRIAN), 21.42% (STREET)
-------------------
dt05_real WER: 16.22% (Average), 20.12% (BUS), 16.25% (CAFE), 12.35% (PEDESTRIAN), 16.18% (STREET)
-------------------
et05_simu WER: 27.57% (Average), 20.17% (BUS), 31.81% (CAFE), 29.96% (PEDESTRIAN), 28.35% (STREET)
-------------------
et05_real WER: 29.03% (Average), 39.37% (BUS), 28.43% (CAFE), 27.56% (PEDESTRIAN), 20.77% (STREET)
-------------------
GMM noisy multi-condition with beamformit using 6 channel data
exp/tri3b_tr05_multi_noisy/best_wer_beamformit_2mics.result
-------------------
best overall dt05 WER 17.26% (language model weight = 10)
-------------------
dt05_simu WER: 18.35% (Average), 15.44% (BUS), 22.51% (CAFE), 15.24% (PEDESTRIAN), 20.21% (STREET)
-------------------
dt05_real WER: 16.17% (Average), 19.12% (BUS), 16.74% (CAFE), 12.27% (PEDESTRIAN), 16.55% (STREET)
-------------------
et05_simu WER: 26.85% (Average), 20.08% (BUS), 30.84% (CAFE), 29.03% (PEDESTRIAN), 27.47% (STREET)
-------------------
et05_real WER: 27.91% (Average), 37.05% (BUS), 29.25% (CAFE), 25.37% (PEDESTRIAN), 19.97% (STREET)
-------------------
GMM noisy multi-condition with BLSTM masking using 6 channel data plus enhanced data
exp/tri3b_tr05_multi_noisy/best_wer_blstm_gev.result
-------------------
best overall dt05 WER 14.57% (language model weight = 10)
-------------------
dt05_simu WER: 15.62% (Average), 12.89% (BUS), 20.49% (CAFE), 14.22% (PEDESTRIAN), 14.90% (STREET)
-------------------
dt05_real WER: 13.52% (Average), 15.52% (BUS), 14.34% (CAFE), 11.57% (PEDESTRIAN), 12.67% (STREET)
-------------------
et05_simu WER: 19.05% (Average), 14.51% (BUS), 21.87% (CAFE), 20.41% (PEDESTRIAN), 19.39% (STREET)
-------------------
et05_real WER: 20.94% (Average), 26.66% (BUS), 21.52% (CAFE), 19.15% (PEDESTRIAN), 16.45% (STREET)
-------------------
DNN sMBR
exp/tri4a_dnn_tr05_multi_noisy_smbr_i1lats/best_wer_beamformit_2mics.result
-------------------
best overall dt05 WER 11.63% (language model weight = 11)
(Number of iterations = 4)
-------------------
dt05_simu WER: 12.36% (Average), 10.66% (BUS), 15.55% (CAFE), 9.87% (PEDESTRIAN), 13.36% (STREET)
-------------------
dt05_real WER: 10.90% (Average), 13.62% (BUS), 10.63% (CAFE), 7.69% (PEDESTRIAN), 11.65% (STREET)
-------------------
et05_simu WER: 19.04% (Average), 14.76% (BUS), 21.72% (CAFE), 19.22% (PEDESTRIAN), 20.45% (STREET)
-------------------
et05_real WER: 20.44% (Average), 30.02% (BUS), 19.95% (CAFE), 17.79% (PEDESTRIAN), 14.01% (STREET)
-------------------
DNN sMBR using all 6 channel data
-------------------
best overall dt05 WER 10.13% (language model weight = 12)
(Number of iterations = 3)
-------------------
dt05_simu WER: 10.69% (Average), 9.19% (BUS), 13.79% (CAFE), 8.51% (PEDESTRIAN), 11.27% (STREET)
-------------------
dt05_real WER: 9.58% (Average), 11.58% (BUS), 10.44% (CAFE), 6.61% (PEDESTRIAN), 9.69% (STREET)
-------------------
et05_simu WER: 16.20% (Average), 12.36% (BUS), 18.81% (CAFE), 16.14% (PEDESTRIAN), 17.50% (STREET)
-------------------
et05_real WER: 16.72% (Average), 23.56% (BUS), 16.72% (CAFE), 14.91% (PEDESTRIAN), 11.71% (STREET)
-------------------
5-gram rescoring
exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/best_wer_beamformit_2mics_5gkn_5k.result
-------------------
best overall dt05 WER 10.17% (language model weight = 11)
-------------------
dt05_simu WER: 10.72% (Average), 9.37% (BUS), 13.70% (CAFE), 8.07% (PEDESTRIAN), 11.73% (STREET)
-------------------
dt05_real WER: 9.63% (Average), 11.93% (BUS), 9.75% (CAFE), 6.46% (PEDESTRIAN), 10.37% (STREET)
-------------------
et05_simu WER: 16.88% (Average), 12.08% (BUS), 19.70% (CAFE), 16.77% (PEDESTRIAN), 18.94% (STREET)
-------------------
et05_real WER: 18.07% (Average), 26.77% (BUS), 17.93% (CAFE), 14.76% (PEDESTRIAN), 12.83% (STREET)
-------------------
5-gram rescoring using all 6 channel data
-------------------
best overall dt05 WER 8.53% (language model weight = 13)
-------------------
dt05_simu WER: 9.03% (Average), 7.79% (BUS), 11.62% (CAFE), 7.06% (PEDESTRIAN), 9.66% (STREET)
-------------------
dt05_real WER: 8.02% (Average), 9.51% (BUS), 8.50% (CAFE), 5.83% (PEDESTRIAN), 8.26% (STREET)
-------------------
et05_simu WER: 13.86% (Average), 9.97% (BUS), 16.29% (CAFE), 13.58% (PEDESTRIAN), 15.60% (STREET)
-------------------
et05_real WER: 14.66% (Average), 21.20% (BUS), 14.48% (CAFE), 12.57% (PEDESTRIAN), 10.38% (STREET)
-------------------
RNNLM
exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/best_wer_beamformit_2mics_rnnlm_5k_h300_w0.5_n100.result
-------------------
best overall dt05 WER 8.86% (language model weight = 12)
-------------------
dt05_simu WER: 9.50% (Average), 8.19% (BUS), 12.15% (CAFE), 7.12% (PEDESTRIAN), 10.55% (STREET)
-------------------
dt05_real WER: 8.23% (Average), 10.90% (BUS), 7.96% (CAFE), 5.22% (PEDESTRIAN), 8.82% (STREET)
-------------------
et05_simu WER: 15.33% (Average), 10.66% (BUS), 18.21% (CAFE), 15.61% (PEDESTRIAN), 16.85% (STREET)
-------------------
et05_real WER: 16.58% (Average), 25.37% (BUS), 15.97% (CAFE), 13.53% (PEDESTRIAN), 11.45% (STREET)
-------------------
RNNLM using all 6 channel data
-------------------
best overall dt05 WER 7.46% (language model weight = 14)
-------------------
dt05_simu WER: 8.06% (Average), 6.73% (BUS), 10.55% (CAFE), 6.22% (PEDESTRIAN), 8.73% (STREET)
-------------------
dt05_real WER: 6.87% (Average), 8.41% (BUS), 7.17% (CAFE), 4.85% (PEDESTRIAN), 7.03% (STREET)
-------------------
et05_simu WER: 12.57% (Average), 8.85% (BUS), 14.85% (CAFE), 12.44% (PEDESTRIAN), 14.14% (STREET)
-------------------
et05_real WER: 13.33% (Average), 18.94% (BUS), 13.04% (CAFE), 11.85% (PEDESTRIAN), 9.49% (STREET)
-------------------
TDNN using all 6 channel data
exp/chain/tdnn1d_sp/best_wer_beamformit_5mics.result
-------------------
best overall dt05 WER 7.89% (language model weight = 10)
-------------------
dt05_simu WER: 8.23% (Average), 7.43% (BUS), 9.71% (CAFE), 6.64% (PEDESTRIAN), 9.16% (STREET)
-------------------
dt05_real WER: 7.55% (Average), 10.15% (BUS), 6.83% (CAFE), 5.28% (PEDESTRIAN), 7.93% (STREET)
-------------------
et05_simu WER: 13.15% (Average), 9.77% (BUS), 14.16% (CAFE), 13.43% (PEDESTRIAN), 15.24% (STREET)
-------------------
et05_real WER: 13.39% (Average), 19.63% (BUS), 11.64% (CAFE), 11.49% (PEDESTRIAN), 10.80% (STREET)
-------------------
TDNN+RNNLM using all 6 channel data
exp/chain/tdnn1d_sp_smbr_lmrescore/best_wer_beamformit_2mics_rnnlm_5k_h300_w0.5_n100.result
-------------------
best overall dt05 WER 5.82% (language model weight = 11)
-------------------
dt05_simu WER: 6.08% (Average), 5.46% (BUS), 7.42% (CAFE), 4.94% (PEDESTRIAN), 6.49% (STREET)
-------------------
dt05_real WER: 5.57% (Average), 8.05% (BUS), 4.72% (CAFE), 3.72% (PEDESTRIAN), 5.80% (STREET)
-------------------
et05_simu WER: 9.90% (Average), 7.00% (BUS), 11.15% (CAFE), 10.05% (PEDESTRIAN), 11.41% (STREET)
-------------------
et05_real WER: 10.53% (Average), 16.90% (BUS), 8.65% (CAFE), 8.52% (PEDESTRIAN), 8.05% (STREET)
-------------------
TDNN using 6 channel data plus enhanced data
exp/chain/tdnn1a_sp/best_wer_beamformit_5mics.result
-------------------
best overall dt05 WER 7.57% (language model weight = 10)
-------------------
dt05_simu WER: 8.18% (Average), 7.12% (BUS), 10.16% (CAFE), 6.33% (PEDESTRIAN), 9.12% (STREET)
-------------------
dt05_real WER: 6.96% (Average), 9.38% (BUS), 6.46% (CAFE), 4.91% (PEDESTRIAN), 7.09% (STREET)
-------------------
et05_simu WER: 13.14% (Average), 9.92% (BUS), 14.55% (CAFE), 13.26% (PEDESTRIAN), 14.83% (STREET)
-------------------
et05_real WER: 12.81% (Average), 19.27% (BUS), 10.66% (CAFE), 11.29% (PEDESTRIAN), 10.03% (STREET)
-------------------
TDNN+RNNLM using 6 channel data plus enhanced data
exp/chain/tdnn1a_sp_smbr_lmrescore/best_wer_beamformit_2mics_rnnlm_5k_h300_w0.5_n100.result
-------------------
best overall dt05 WER 5.52% (language model weight = 10)
-------------------
dt05_simu WER: 6.02% (Average), 5.28% (BUS), 7.37% (CAFE), 4.60% (PEDESTRIAN), 6.81% (STREET)
-------------------
dt05_real WER: 5.03% (Average), 7.23% (BUS), 4.26% (CAFE), 3.26% (PEDESTRIAN), 5.35% (STREET)
-------------------
et05_simu WER: 10.35% (Average), 7.84% (BUS), 11.04% (CAFE), 10.55% (PEDESTRIAN), 11.95% (STREET)
-------------------
et05_real WER: 10.20% (Average), 16.21% (BUS), 8.18% (CAFE), 8.43% (PEDESTRIAN), 7.98% (STREET)
-------------------
TDNN with BLSTM masking using 6 channel data plus enhanced data
exp/chain/tdnn1a_sp/best_wer_blstm_gev.result
-------------------
best overall dt05 WER 6.35% (language model weight = 9)
-------------------
dt05_simu WER: 7.03% (Average), 5.72% (BUS), 9.32% (CAFE), 6.28% (PEDESTRIAN), 6.78% (STREET)
-------------------
dt05_real WER: 5.66% (Average), 6.89% (BUS), 5.99% (CAFE), 4.44% (PEDESTRIAN), 5.34% (STREET)
-------------------
et05_simu WER: 8.80% (Average), 6.80% (BUS), 10.20% (CAFE), 8.37% (PEDESTRIAN), 9.84% (STREET)
-------------------
et05_real WER: 9.46% (Average), 13.42% (BUS), 8.31% (CAFE), 8.76% (PEDESTRIAN), 7.34% (STREET)
-------------------
TDNN+RNNLM with BLSTM masking using 6 channel data plus enhanced data
exp/chain/tdnn1a_sp_smbr_lmrescore/best_wer_blstm_gev_rnnlm_5k_h300_w0.5_n100.result
-------------------
best overall dt05 WER 4.41% (language model weight = 11)
-------------------
dt05_simu WER: 5.03% (Average), 4.13% (BUS), 6.83% (CAFE), 4.45% (PEDESTRIAN), 4.72% (STREET)
-------------------
dt05_real WER: 3.79% (Average), 4.68% (BUS), 3.94% (CAFE), 2.95% (PEDESTRIAN), 3.61% (STREET)
-------------------
et05_simu WER: 6.07% (Average), 4.52% (BUS), 6.93% (CAFE), 6.05% (PEDESTRIAN), 6.78% (STREET)
-------------------
et05_real WER: 6.93% (Average), 10.23% (BUS), 6.13% (CAFE), 6.41% (PEDESTRIAN), 4.97% (STREET)
-------------------
TDNN+RNNLM with BLSTM masking using 6 channel data plus enhanced data
exp/chain/tdnn1a_sp_smbr_lmrescore/best_wer_blstm_gev_rnnlm_lstm_1a_w0.5_n100.result
-------------------
best overall dt05 WER 3.39% (language model weight = 10)
-------------------
dt05_simu WER: 3.94% (Average), 2.99% (BUS), 5.65% (CAFE), 3.44% (PEDESTRIAN), 3.67% (STREET)
-------------------
dt05_real WER: 2.85% (Average), 3.58% (BUS), 2.89% (CAFE), 2.07% (PEDESTRIAN), 2.85% (STREET)
-------------------
et05_simu WER: 5.03% (Average), 3.66% (BUS), 5.57% (CAFE), 4.87% (PEDESTRIAN), 6.03% (STREET)
-------------------
et05_real WER: 5.40% (Average), 7.81% (BUS), 4.71% (CAFE), 4.73% (PEDESTRIAN), 4.37% (STREET)
-------------------