Blame view

egs/mgb5/README 542 Bytes
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  ###
  # MGB-5 corpus: Moroccan Arabic Automatic Speech Recognition
  # Created in collaboration between QCRI and ELRA
  # More details can be found here: https://arabicspeech.org/mgb5
  ###
  
  
  ## INTRODUCTION ##
  Training data: 10.2 hours from 69 programs
  Development data: 1.8 hours from 10 programs
  Testing data: 2.0 hours from 14 programs
  
  ## KNOWN ISSUES ##
  1- The dev data does not have the same alignment across the four annotators 
  2- Once alignment is consistent, we can include multi-refence word error rate
  3- Use MGB-2 as background model