About farsdat:

  Available as ELRA corpus ELRA-S0112, farsdat is counterpart of TIMIT 
  for Persian language. Description of catalog from ELRA

  "The farsdat corpus of read speech is designed to provide speech data 
   for acoustic-phonetic studies and for the development and evaluation
   of Persian automatic speech recognition systems. TIMIT contains broadband
   recordings of 304 speakers of ten major dialects of Iranian Farsi language,
   each reading ten phonetically rich sentences. The farsdat corpus includes
   time-aligned orthographic, phonetic and word transcriptions as well as 
   a 16-bit, 22050Hz speech waveform file for each utterance."

  s5: the currently recommended recipe.