// doc/examples.dox // Copyright 2016 Fred Richardson Allen Guo // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. /** \page examples Examples included with Kaldi When you check out the Kaldi source tree (see \ref install), you will find many sets of example scripts in the egs/ directory. This table summarizes some key facts about some of those example scripts; however, it it not an exhaustive list.
Name BW Lang Train Domain Train Hours Train Speakers License and Availability Year Released Speech Style Test Domain Kaldi Aprox Perf Model Type LM Data Lexicon
AMI 16k English
(+non-native)
Microphone: head-mike,
single and multiple
distance mikes
100 123 M
66 F
Free /
Download
http://groups.inf.ed.ac.uk/ami/corpus/
2014 Meeting room Same as train
no overlap(?)
~25% WER head (T)DNN
~45% WER distant (B)LSTM
AMI + (opt) Fisher 50K (CMU dict +
kaldi sources)
Aspire English Conversational microphone
developed on telephone
see Fisher 2015 30.8% WER (dev or eval?)
WSJ 16k English Clean close-mic
read speech
80 LDC
LDC93S6B (WSJ0) and LDC94S13B (WSJ1)
1993 Read speech Same 6-7% WER same as train 20k (CMU dict)
RM English read transcript
limited vocab and grammar
LDC
LDC93S3A
1987-1989 read speech same 1-2% WER predefined grammar <1K
RM dict
Timit 16k English read transcript
very limited grammar
630 1986 read speech same ~30-40% PER none ~47 phones
fisher_english 8k English Telephone speech
Auto-transcribed
(errorful transcriptions)
1,600 5203 M
7198 F
LDC
speech: LDC2004S13, LDC2005S13
transcript: LDC2004T19, LDC2005T19
2004/2005 CTS Fisher (may
overlap witb
train)
~22% WER (DNN) LDC Fisher CMU dict
Size UNK
Switchboard 1 8k English CTS 300 LDC
Train: LDC97S62
Mississippi State transcriptions
Eval: LDC2002S09 and LDC2002T43
1993/1997/2000 CTS CTS
eval2000 (hub5)
~10% WER (LSTM) Mississippi Trans
+ (opt) Fisher
30K (CMU dict)
Switchboard 1
+ Fisher
8k English CTS see above see above see above see above CTS eval2000
rt03
~12% eval2000
~19% rt03
see above see above
Callhome
Egyptian
Egyptian
Colloquial
Arabic
CTS 120 conv LDC
Speech : LDC97S45
Transcripts : LDC97T19
Lexicon : LDC99L22
1997 CTS hub5 arabic
LDC2002S22
LDC2002T39
50-60% WER Train trans LDC dict
Corpus of
Spontaneous
Japanese
Japanese Mixed style
Close-talking mic
650 hours
(240 hr train)
>1,400 Unclear how to get this
http://www.ninjal.ac.jp/english/products/csj/
http://pj.ninjal.ac.jp/corpus_center/csj/
2004 Mixed 9-10% WER UNK UNK
Fisher Spanish
Callhome Spanish
Caribbean
Spanish
CTS Fisher: 163 hrs
Callhome: 60 hrs?
120 30min conv
Fisher: 136
Callhome:
LDC
Fisher speech : LDC96S35
Fisher transcripts : LDC96T17
Callhome Speech : LDC96S35
Callhome Transcripts : LDC96T17
Fisher: 2010
Callhome: 1996
CTS Kaldi subset
of Fisher
29-30% WER Fisher trans LDC96L16
Gale Arabic
Phase 2
16K Arabic Broadcast
Conversational/Report
320 train
9.3 test
LDC2013S02 LDC2014S07
LDC2013S07 LDC2014T17
LDC2013T17
LDC2013T04
Collected
2006/2007
Broadcast
Conversational
and Report
Report: 13% WER (LSTM)
Conver: 28% WER (LSTM)
Comb: 24% WER (LSTM)
LDC2013T17
LDC2013T04
LDC2014T17
http://alt.qcri.org/
Gale Mandarin 16K Mandarin
Chinese
Broadcast 126 LDC2013S08 LDC2013T20 2006-2007 Broadcast Same as train 17.5% WER [1] LDC2013S08
LDC2013T20
Same as HKUST below
hkust
EARS RT04F data
dev and train [2]
8K Mandarin
Chinese
Telephone Conversational ~145 ~873 LDC2005S15 LDC2005T32 2004 Conversational Same as train 33.5% CER Acoustic trans
(very little)
Both Eng and Man.
CMU dict use for Eng
mdbg dict use for Man
http://www.mdbg.net
librispeech [3] 16K English Read transcription 100 - 960
(460
F: 125-1128
M: 126-1167
http://www.openslr.org/12/ 2015 Read trans Librispeech
~5% Large (books) cmu (with sequitur)
G2P)
reverb
sprakbanken Danish Read transcript? 350 Free download
http://www.nb.no/sprakbanken/#ticketsfrom?lang=en
2012 Read/Dictation Same as train 14% WER NST Provided NST Provided?
vystadial_en [4] 8Khz English Telephone, dialog system 41 unk Free 2014 Dialog sys Same as train ~11% WER (GMM/HMM) Train trans CMU + 250
vystadial_cz [4] 8Khz Czech Telephone, dialog system 15 unk Free 2014 Dialog sys Same as train ~50% WER (GMM/HMM) Train trans Rule derived
chime3 16Khz English Read trans, simulated
and real noise
18 WSJ0 + 4 Not clear (Chime performers) 2015 Read
transcript
Same as train
(same channels!)
~12% WER real (4 spkrs)
~12% WER simu
Official WSJ0 5K
trans
WSJ0
voxforge 16Khz English Read trans >75hrs unk Free GPL 2008? Read trans unk unk Train cmu + g2p for oov
Tedlium 16KHz English Presentation/talk 118 666 Free download 2014? Presentation Same as train ~10% WER Cantab provided LM Cantab provided dict
[1] "Audio Augmentation for Speech Recognition" Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur.
[2] There should be more Mandarin data from rt04f - 50 hours of dev data I believe (see LDC2004E67, LDC2004E68). There should also be eval data. See https://www.ldc.upenn.edu/collaborations/past-projects/gale/data/gale-pubs.
[3] See http://www.danielpovey.com/files/2015_icassp_librispeech.pdf for details. Acoustic and language models are available online.
[4] See http://www.lrec-conf.org/proceedings/lrec2014/pdf/535_Paper.pdf. */