trn - Definition of a transcript input file
The transcript format is a file of word sequence records separated by newlines. Each record contains a word sequence, follow by the an utterance ID enclosed in parenthesis. See the '-i' option for a list of accepted utterance id types.
example.
Transcript alternations, described above, can be used in the word sequence by using this BNF format:
The "@" represents a NULL word in the transcript. For scoring purposes, an error is not counted if the "@" is aligned as an insertion.
example
txt - Definition of a text input file
This describes the segment time marked files to be used for scoring the output of speech recognizers via the NIST sclite() program. This is a reference file format.
The segment time mark file consists of a concatenation of text segment records from a waveform file. Each record is separated by a newline and contains: the waveform's filename and channel identifier [A | B], the talkers id, begin and end times (in seconds), optional subset label and the text for the segment. Each record follows this BNF format:
STM :== <F> <C> <S> <BT> <ET> [ <LABEL> ] transcript . . .
The list of words can contain an transcript alternation using the following BNF format:
The "@" represents a NULL word in the transcript. For scoring purposes, an error is not counted if the "@" is aligned as an insertion.
When the string "IGNORE_TIME_SEGMENT_IN_SCORING" is used as the transcript, the process which chops the hypothesis file to matching reference segments ignores all hypothesis words whose time-midpoints occur within the reference segments beginning and ending time. The effect is to declare this segments regions as "out-of-bounds" for scoring, thus generation no errors from that time region.
The file must be sorted by the first and second columns in ASCII order, and the fourth in numeric order. The UNIX sort command: "sort +0 -1 +1 -2 +3nb -4" will sort the words into appropriate order.
Lines beginning with ';;' are considered comments and are ignored. Blank lines are also ignored.
Each position within the label field, separated by a commas, defines a group of subsets that are presented separately in the generated reports. So for instance, the first group might be all segments, and the second might be either male or female, and the third might be the story. The example below shows an STM file encoded with this information.
ctm - Definition of time marked conversation scoring input
This describes the time marked conversation input files to be used for scoring the output of speech recognizers via the NIST sclite() program. Both the reference and hypothesis input files can share this format.
The ctm file format is a concatenation of time mark records for each word in each channel of a waveform. The records are separated with a newline. Each word token must have a waveform id, channel identifier [A | B], start time, dura- tion, and word text. Optionally a confidence score can be appended for each word. Each record follows this BNF for- mat:
CTM :== <F> <C> <BT> <DUR> word [ <CONF> ]
The file must be sorted by the first three columns: the first and the second in ASCII order, and the third by a numeric order. The UNIX sort command: "sort +0 -1 +1 -2 +2nb -3" will sort the words into appropriate order.
Lines beginning with ';;' are considered comments and are ignored. Blank lines are also ignored.
Included below is an example:
For CTM reference files, a format extension exists to permit marking alternate transcripts. The alternation uses the same file format as described above, except three word strings, "<ALT_BEGIN>", "<ALT>" and "<ALT_END>", are used to delimit the alternation. Each tag is treated as a word, with a conversation id, channel and "*"'s for the begin and duration time.
The alternation is begun using the word "<ALT_BEGIN>", and terminated using the word "<ALT_END>". In between the start and end, are at least 2 alternative time-marked word sequences separated by the word "<ALT>". Each word sequence can contain any number of words. An empty alternative sig- nifies a null word.
Below is and example alternate reference transcript for the words "uh" and "um".