SCLITE Command Line Options

Sclite Commandline Options

The commandline options for sclite can be broken into four categories:

Define the character encoding used for the text portion input ref and hyp files. The flag "gb" stands for GB encoded Chinese and "euc" stands for EUC encoded Japanese. Both encodings are 2-byte per character encodings. The default, is extended ASCII.

-h

The -h option may be used more than once to align multiple files.

-i [ wsj | atis | rm | swb | spu_id ]

trn

wsj -: for Wall Street Journal and CSRNAB
atis -: for ATIS3
rm | swb | spu_id -: are synonyms which refer to generic utterance id formats whereby the utterance id is made up of a speaker code, followed by a hyphen or underscore, followed by an utterance number.

This option is only required for aligning transcript inputs (trn). TBD

-P

Alignments are read from 'stdin' as input to sclite. The format of the input must be in the "sgml" output format, created either by '-o sgml' or by piped input from another sclite utility. No re-alignments are performed on the read in alignments, only scoring reports can be generated.

Interpret the text symbols as a right-to-left language such as Arabic. The default is to interpret text in a left-to-right fashion as in English.

Alignment Options:

-c [ NOASCII DH ]

Chop up the words into separate characters before doing the alignment. It is generally not the practice of the ARPA community to score at the character level. The intent of this option is to be able to score Mandarin Chinese at the character level. The option "NOASCII" does not separate characters if they are ASCII. The option "DH" deletes hyphens from the ref and hyp strings before alignment. This option only works using the DP alignment algorithm. (-c & -d are exclusive)

-d

GNU diff

-F

Perform the alignment using a cost function which counts fragments, words ending or beginning with a hyphen, as correct if the spelling up to the hyphen matches the spelling of the hypothesized word. Options -F and -d are exclusive.

-L LM

CMU-Cambridge Statistical Language Modeling Toolkit v2

idngram2lm

The language model is used to compute an individual weight for each word in the reference and hypothesis strings. The weight is defined to be Log₂(P(word|context)). Each pair of aligned strings is considered to be independent, so therefore, there is no context for initial words in each pair.

The word-weights are used in two ways, first as a method to define word-to-word distances for word-weight-mediated alignment and second to perform weighted word scoring .

Out-of-Vocabulary words get the default weight of 20.0, and optionally deletable words get a default weight of 0.0.

-m [ ref | hyp ]

When this option is used, the alignment phase of scoring ignores any segment or word (depending on the option(s) used) which is not in the time span of the opposite file. The time span of a file is defined to be start time of the first time mark, to the end time of the last time mark.

The "ref" option reduces the reference segments to those which are within the hypothesis file time span.

The "hyp" option reduces the hypothesis words to those which are within the reference file tiem span.

Both "ref" and "hyp" may be used simultaneously.

The argument -m by itself defaults to '-m ref'. Exclusive with -d.

-s

Do Case-sensitive alignments. Otherwise all input is mapped to a single case before scoring. Of course, GB and UEX encode text data is never case-converted.

-S algo1 lexicon [ ASCIITOO ]

Convert the previously word-segmented reference string into a word network.
Covert the hypothesis text to a string of characters, each character representing a word. The data represented is then convert to a network.
```
ex.    * --- A --- * --- T --- * --- 0 --- *
```
Consider all possible sequences of letters through the network. If a sequence creates a word which is represented in the lexicon, add an arc to the network representing the word. The maximum characters per word is limited to the maximum word length in the lexicon.
```
                     ,-------- TO -------.
                    /                     \
ex.    * --- A --- * --- T --- * --- 0 --- *
        \                     /
         `------- AT --------'
```
DP Align the reference and hypothesis networks, and extract a minimal cost path.

The supplied "lexicon" must be a sorted list of word records, each separated by a newline. Only the first column, separated by whitespace, is read in and used for the lexicon. By default, the algorithm only separates hypothesis characters that are GB or EUC encoded. If the option "ASCIITOO" is used, ASCII hypothesis words are also converted to characters in step 2.

Exclusive with -d.

-S algo2 lexicon [ ASCIITOO ]

Exclusive with -d.

-T

main SCLITE

Options -F and -d are exlcusive.

-w wwl_file

word-weight-mediated alignment

weighted word scoring

If the supplied WWL filename is "unity", then no file of weights is read in. Instead, this is a shorthand notation to use a weight of 1.0 for all words.

Optionally deletable words get a default weight of 0.0, (even if "unity" is supplied as the WWL filename).

The format of the WWL file is as follows.

;; 'Headings' '<COL1>' '<COL2>' '<COL3>' ....

The second "special" comment line defines the default weight applied to out-of-vocabulary words if any exist. The format for this line is:

;; Default missing weight '<number>'

The remainder of the file consists of word records, each word record separated by a newline. The format of each record is:

<WORD_TEXT> <WEIGHT_1> <WEIGHT_2> . . .

NOTE: The current version of SCTK only utilizes the first weight.

Output Options:

-f level

As a well behaved program, reassure the user that the program is continuing to perform it's task by providing the user with some feedback. The feedback levels, defined by this option are: 0) no feedback, 1) processing feedback (i.e. status of text loading and alignments); 2) processing feedback plus printing out aligned strings. The feedback level defaults to 0 if no output options are specified using the '-o' option, otherwise it defaults to 1.

-l width

When printing the text alignments for the output option "pralign" wrap the lines at "width" characters. Default is 1000 characters.

-O output_dir

, write them into the directory "output_dir". If the output directory does not exist, all reports will be written to stdout.

-p

Write to standard out the resulting alignments so they can be piped to another sclite utility. The format of the output is the same as '-o sgml'. The options sets the feedback level, with '-f' to 0.

Scoring Report Options:

-C [ det | bhist | hist | none ]

ctm

Examples.

-n name

Writes all outputs using 'name' as a root filename instead of 'hypfile'. For multiple hypothesis files, the root filename is 'name'.'hypfile'

-o

sum -: Produce a summary of speaker performance in terms of Percents: Correct, Substitutions, Deletions, Insertions, Word Errors and Sentence (or Utterance) errors. System averages and speaker means, medians and standard deviations are computed for each percentage. If the report is not going to stdout, the output is placed in a file called "<hypfile>.sys". The options '-O' and '-n' can change the destination of the output file. Example
rsum -: Produce a summary similar to 'sum' except output word counts instead of percentages. If the report is not going to stdout, the output is placed in a file called .raw. The options '-O' and '-n' can change the destination of the output file. Example
wws -: Produce a summary similar to 'sum' except output weighted word error instead of word error. If the report is not going to stdout, the output is placed in a file called .wws. The options '-O' and '-n' can change the destination of the output file. Example
pralign - pra -: Produce a text copy of all the string alignments. If the report is not going to stdout, the output is placed in a file called .pra. The options '-O' and '-n' can change the destination of the output file. "pralign" and "pra" are synonyms. Example
prf -: Produce a text copy of all the string alignments similar to that produced by "pralign" except, include all relevant information concerning the alignments. That is, include in the output things like: word beginning and ending times, reference segment beginning and ending times, and hypothesis word confidence scores. Example
all -: Produces the three reports: "sum", "rsum", and pralign"
stdout -: Write all selected scoring reports to stdout. If the feedback level is not specified using the '-f' option, the feedback level is set to 0.
sgml -: Produce a dump of the text alignments in an sgml notation. The output consists of tags at the system, speaker, and sentence level. Text information is only present at the sentence level and consists a comma separated list of word alignments. The word alignments can be either of the following: C:"word" or I:"word" or D:"word" or S:"word1","word2" for correct, insertion, deletion and substitution respectively. If the report is not going to stdout, the output is placed in a file called .sgml. The options '-O' and '-n' can change the destination of the output file. Example
lur -: Produce a Labeled Utterance Report (LUR) based information in the reference STM file. (Note: only reference files in STM format support this option.) The LUR report is a report which tabulates overall error rate statistics and statistics over arbitrary subsets of the reference data, e.g. speaker's sex, audio characteristics. If the report is not going to stdout, the output is placed in a file called .lur. The options '-O' and '-n' can change the destination of the output file. Example
snt -: Produce a scoring report file for all utterance/segments of a speaker. Within each file, one per speaker, is a by-utterance error analysis which contains: the aligned text, error classification percentages and other statistics. If the report is not going to stdout, the output is placed in a file called .snt.. The options '-O' and '-n' can change the destination and name of the output file. Example
spk -: Produce a scoring report file summarizing the errors made on the speaker's utterances. Within each file, one per speaker id, is a summarization of utterance and word errors along with confusion pair, insertion, deletion, substitution and falsely recognized word lists. If the report is not going to stdout, the output is placed in a file called .spk.. The options '-O' and '-n' can change the destination and name of the output file. Example
dtl -: Produce a scoring report in the same format as the "spk" report using statistics gathered over the entire test set. If the report is not going to stdout, the output is placed in a file called .dtl. The options -'O' and '-n' can change the destination and name of the output file. Example
none -: Produce no output reports.

If this option is not specified, the default options are "sum" and "stdout". If the user wishes to have reports other than "sum" to be written to stdout, then the "stdout" flag must be used in the argument list. Options that are duplicated, have the effect of nullification. So for instance using the options "all pralign" is equivalent to "sum rsum". Defines the output reports. Default: 'sum stdout'