// doc/examples.dox // Copyright 2016 Fred Richardson Allen Guo // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. /** \page examples Examples included with Kaldi When you check out the Kaldi source tree (see \ref install), you will find many sets of example scripts in the egs/ directory. This table summarizes some key facts about some of those example scripts; however, it it not an exhaustive list.

Name	BW	Lang	Train Domain	Train Hours	Train Speakers	License and Availability	Year Released	Speech Style	Test Domain	Kaldi Aprox Perf	LM Data	Lexicon
AMI	16k	English (+non-native)	Microphone: head-mike, single and multiple distance mikes	100	123 M 66 F	Free / Download http://groups.inf.ed.ac.uk/ami/corpus/	2014	Meeting room	Same as train no overlap(?)	~25% WER head (T)DNN ~45% WER distant (B)LSTM	AMI + (opt) Fisher	50K (CMU dict + kaldi sources)
Aspire		English	Conversational microphone developed on telephone	see Fisher			2015			30.8% WER (dev or eval?)
WSJ	16k	English	Clean close-mic read speech	80		LDC LDC93S6B (WSJ0) and LDC94S13B (WSJ1)	1993	Read speech	Same	6-7% WER	same as train	20k (CMU dict)
RM		English	read transcript limited vocab and grammar			LDC LDC93S3A	1987-1989	read speech	same	1-2% WER	predefined grammar	<1K RM dict
Timit	16k	English	read transcript very limited grammar		630		1986	read speech	same	~30-40% PER	none	~47 phones
fisher_english	8k	English	Telephone speech Auto-transcribed (errorful transcriptions)	1,600	5203 M 7198 F	LDC speech: LDC2004S13, LDC2005S13 transcript: LDC2004T19, LDC2005T19	2004/2005	CTS	Fisher (may overlap witb train)	~22% WER (DNN)	LDC Fisher	CMU dict Size UNK
Switchboard 1	8k	English	CTS	300		LDC Train: LDC97S62 Mississippi State transcriptions Eval: LDC2002S09 and LDC2002T43	1993/1997/2000	CTS	CTS eval2000 (hub5)	~10% WER (LSTM)	Mississippi Trans + (opt) Fisher	30K (CMU dict)
Switchboard 1 + Fisher	8k	English	CTS	see above	see above	see above	see above	CTS	eval2000 rt03	~12% eval2000 ~19% rt03	see above	see above
Callhome Egyptian		Egyptian Colloquial Arabic	CTS	120 conv		LDC Speech : LDC97S45 Transcripts : LDC97T19 Lexicon : LDC99L22	1997	CTS	hub5 arabic LDC2002S22 LDC2002T39	50-60% WER	Train trans	LDC dict
Corpus of Spontaneous Japanese		Japanese	Mixed style Close-talking mic	650 hours (240 hr train)	>1,400	Unclear how to get this http://www.ninjal.ac.jp/english/products/csj/ http://pj.ninjal.ac.jp/corpus_center/csj/	2004	Mixed		9-10% WER	UNK	UNK
Fisher Spanish Callhome Spanish		Caribbean Spanish	CTS	Fisher: 163 hrs Callhome: 60 hrs? 120 30min conv	Fisher: 136 Callhome:	LDC Fisher speech : LDC96S35 Fisher transcripts : LDC96T17 Callhome Speech : LDC96S35 Callhome Transcripts : LDC96T17	Fisher: 2010 Callhome: 1996	CTS	Kaldi subset of Fisher	29-30% WER	Fisher trans	LDC96L16
Gale Arabic Phase 2	16K	Arabic	Broadcast Conversational/Report	320 train 9.3 test		LDC2013S02 LDC2014S07 LDC2013S07 LDC2014T17 LDC2013T17 LDC2013T04	Collected 2006/2007	Broadcast Conversational and Report		Report: 13% WER (LSTM) Conver: 28% WER (LSTM) Comb: 24% WER (LSTM)	LDC2013T17 LDC2013T04 LDC2014T17	http://alt.qcri.org/
Gale Mandarin	16K	Mandarin Chinese	Broadcast	126		LDC2013S08 LDC2013T20	2006-2007	Broadcast	Same as train	17.5% WER [1]	LDC2013S08 LDC2013T20	Same as HKUST below
hkust EARS RT04F data dev and train [2]	8K	Mandarin Chinese	Telephone Conversational	~145	~873	LDC2005S15 LDC2005T32	2004	Conversational	Same as train	33.5% CER	Acoustic trans (very little)	Both Eng and Man. CMU dict use for Eng mdbg dict use for Man http://www.mdbg.net
librispeech [3]	16K	English	Read transcription	100 - 960 (460	F: 125-1128 M: 126-1167	http://www.openslr.org/12/	2015	Read trans	Librispeech	~5%	Large (books)	cmu (with sequitur) G2P)
reverb
sprakbanken		Danish	Read transcript?	350		Free download http://www.nb.no/sprakbanken/#ticketsfrom?lang=en	2012	Read/Dictation	Same as train	14% WER	NST Provided	NST Provided?
vystadial_en [4]	8Khz	English	Telephone, dialog system	41	unk	Free	2014	Dialog sys	Same as train	~11% WER (GMM/HMM)	Train trans	CMU + 250
vystadial_cz [4]	8Khz	Czech	Telephone, dialog system	15	unk	Free	2014	Dialog sys	Same as train	~50% WER (GMM/HMM)	Train trans	Rule derived
chime3	16Khz	English	Read trans, simulated and real noise	18	WSJ0 + 4	Not clear (Chime performers)	2015	Read transcript	Same as train (same channels!)	~12% WER real (4 spkrs) ~12% WER simu	Official WSJ0 5K trans	WSJ0
voxforge	16Khz	English	Read trans	>75hrs	unk	Free GPL	2008?	Read trans	unk	unk	Train	cmu + g2p for oov
Tedlium	16KHz	English	Presentation/talk	118	666	Free download	2014?	Presentation	Same as train	~10% WER	Cantab provided LM	Cantab provided dict

[1] "Audio Augmentation for Speech Recognition" Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur.
[2] There should be more Mandarin data from rt04f - 50 hours of dev data I believe (see LDC2004E67, LDC2004E68). There should also be eval data. See https://www.ldc.upenn.edu/collaborations/past-projects/gale/data/gale-pubs.
[3] See http://www.danielpovey.com/files/2015_icassp_librispeech.pdf for details. Acoustic and language models are available online.
[4] See http://www.lrec-conf.org/proceedings/lrec2014/pdf/535_Paper.pdf. */