Blame view
egs/callhome_egyptian/README.txt
1.04 KB
8dcb6dfcb first commit |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
About the Callhome Egyptian Arabic Corpus The CALLHOME Egyptian Arabic corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary represents is Cairene Arabic. This recipe uses the speech and transcripts available through LDC. In addition, an Egyptian arabic phonetic lexicon (available via LDC) is used to get word to phoneme mappings for the vocabulary. This datasets are: Speech : LDC97S45 Transcripts : LDC97T19 Lexicon : LDC99L22 Each subdirectory of this directory contains the scripts for a sequence of experiments. s5: This recipe is based on the WSJ s5 recipe. It works with the romanized version of the transcripts (available along with the script in LDC97T19). In addition, it uses a phonetic lexicon. The recipe follows the Triphone+SGMM+SAT+fMLLR pipeline. It uses data partitions as specified by LDC in the corpora description. |