// doc/examples.dox // Copyright 2016 Fred Richardson Allen Guo // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. /** \page examples Examples included with Kaldi When you check out the Kaldi source tree (see \ref install), you will find many sets of example scripts in the egs/ directory. This table summarizes some key facts about some of those example scripts; however, it it not an exhaustive list.
Name | BW | Lang | Train Domain | Train Hours | Train Speakers | License and Availability | Year Released | Speech Style | Test Domain | Kaldi Aprox Perf | Model Type | LM Data | Lexicon |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMI | 16k | English (+non-native) |
Microphone: head-mike, single and multiple distance mikes |
100 | 123 M 66 F |
Free / Download http://groups.inf.ed.ac.uk/ami/corpus/ |
2014 | Meeting room | Same as train no overlap(?) |
~25% WER head (T)DNN ~45% WER distant (B)LSTM |
AMI + (opt) Fisher | 50K (CMU dict + kaldi sources) |
|
Aspire | English | Conversational microphone developed on telephone |
see Fisher | 2015 | 30.8% WER (dev or eval?) | ||||||||
WSJ | 16k | English | Clean close-mic read speech |
80 | LDC LDC93S6B (WSJ0) and LDC94S13B (WSJ1) |
1993 | Read speech | Same | 6-7% WER | same as train | 20k (CMU dict) | ||
RM | English | read transcript limited vocab and grammar |
LDC LDC93S3A |
1987-1989 | read speech | same | 1-2% WER | predefined grammar | <1K RM dict |
||||
Timit | 16k | English | read transcript very limited grammar |
630 | 1986 | read speech | same | ~30-40% PER | none | ~47 phones | |||
fisher_english | 8k | English | Telephone speech Auto-transcribed (errorful transcriptions) |
1,600 | 5203 M 7198 F |
LDC speech: LDC2004S13, LDC2005S13 transcript: LDC2004T19, LDC2005T19 |
2004/2005 | CTS | Fisher (may overlap witb train) |
~22% WER (DNN) | LDC Fisher | CMU dict Size UNK |
|
Switchboard 1 | 8k | English | CTS | 300 | LDC Train: LDC97S62 Mississippi State transcriptions Eval: LDC2002S09 and LDC2002T43 |
1993/1997/2000 | CTS | CTS eval2000 (hub5) |
~10% WER (LSTM) | Mississippi Trans + (opt) Fisher |
30K (CMU dict) | ||
Switchboard 1 + Fisher |
8k | English | CTS | see above | see above | see above | see above | CTS | eval2000 rt03 |
~12% eval2000 ~19% rt03 |
see above | see above | |
Callhome Egyptian |
Egyptian Colloquial Arabic |
CTS | 120 conv | LDC Speech : LDC97S45 Transcripts : LDC97T19 Lexicon : LDC99L22 |
1997 | CTS | hub5 arabic LDC2002S22 LDC2002T39 |
50-60% WER | Train trans | LDC dict | |||
Corpus of Spontaneous Japanese |
Japanese | Mixed style Close-talking mic |
650 hours (240 hr train) |
>1,400 | Unclear how to get this http://www.ninjal.ac.jp/english/products/csj/ http://pj.ninjal.ac.jp/corpus_center/csj/ |
2004 | Mixed | 9-10% WER | UNK | UNK | |||
Fisher Spanish Callhome Spanish |
Caribbean Spanish |
CTS | Fisher: 163 hrs Callhome: 60 hrs? 120 30min conv |
Fisher: 136 Callhome: |
LDC Fisher speech : LDC96S35 Fisher transcripts : LDC96T17 Callhome Speech : LDC96S35 Callhome Transcripts : LDC96T17 |
Fisher: 2010 Callhome: 1996 |
CTS | Kaldi subset of Fisher |
29-30% WER | Fisher trans | LDC96L16 | ||
Gale Arabic Phase 2 |
16K | Arabic | Broadcast Conversational/Report |
320 train 9.3 test |
LDC2013S02 LDC2014S07 LDC2013S07 LDC2014T17 LDC2013T17 LDC2013T04 |
Collected 2006/2007 |
Broadcast Conversational and Report |
Report: 13% WER (LSTM) Conver: 28% WER (LSTM) Comb: 24% WER (LSTM) |
LDC2013T17 LDC2013T04 LDC2014T17 |
http://alt.qcri.org/ | |||
Gale Mandarin | 16K | Mandarin Chinese |
Broadcast | 126 | LDC2013S08 LDC2013T20 | 2006-2007 | Broadcast | Same as train | 17.5% WER [1] | LDC2013S08 LDC2013T20 |
Same as HKUST below | ||
hkust EARS RT04F data dev and train [2] |
8K | Mandarin Chinese |
Telephone Conversational | ~145 | ~873 | LDC2005S15 LDC2005T32 | 2004 | Conversational | Same as train | 33.5% CER | Acoustic trans (very little) |
Both Eng and Man. CMU dict use for Eng mdbg dict use for Man http://www.mdbg.net |
|
librispeech [3] | 16K | English | Read transcription | 100 - 960 (460 |
F: 125-1128 M: 126-1167 |
http://www.openslr.org/12/ | 2015 | Read trans | Librispeech |
~5% | Large (books) | cmu (with sequitur) G2P) |
|
reverb | |||||||||||||
sprakbanken | Danish | Read transcript? | 350 | Free download http://www.nb.no/sprakbanken/#ticketsfrom?lang=en |
2012 | Read/Dictation | Same as train | 14% WER | NST Provided | NST Provided? | |||
vystadial_en [4] | 8Khz | English | Telephone, dialog system | 41 | unk | Free | 2014 | Dialog sys | Same as train | ~11% WER (GMM/HMM) | Train trans | CMU + 250 | |
vystadial_cz [4] | 8Khz | Czech | Telephone, dialog system | 15 | unk | Free | 2014 | Dialog sys | Same as train | ~50% WER (GMM/HMM) | Train trans | Rule derived | |
chime3 | 16Khz | English | Read trans, simulated and real noise |
18 | WSJ0 + 4 | Not clear (Chime performers) | 2015 | Read transcript |
Same as train (same channels!) |
~12% WER real (4 spkrs) ~12% WER simu |
Official WSJ0 5K trans |
WSJ0 | |
voxforge | 16Khz | English | Read trans | >75hrs | unk | Free GPL | 2008? | Read trans | unk | unk | Train | cmu + g2p for oov | |
Tedlium | 16KHz | English | Presentation/talk | 118 | 666 | Free download | 2014? | Presentation | Same as train | ~10% WER | Cantab provided LM | Cantab provided dict |