revis.htm
7.96 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
<!-- $Id: revis.htm,v 1.6 2004/08/30 15:10:38 jfiscus Exp $ -->
<HTML><HEAD>
<CENTER><TITLE>SCLITE revisions</TITLE>
</HEAD>
<BODY></CENTER><p><hr>
<H1>
<A NAME="revisions_name_0">
<strong>
<A HREF="sclite.htm#sclite_name_0">Sclite</A> Revision.txt </A>
</strong>
</H1>
<p>
<pre>
sclite 1.0 - Released July 27, 1995
sclite 1.1 - Released September 27, 1995
- New/modified output options:
* Added options to '-o': 'none' to not make any reports,
'sgml' to create an sgml file for alignments, 'lur' for the
labeled utterance report.
* '-p'. Pipes output of alignments to other sclite utilities.
in the sgml format.
- New Input options:
* '-P' accepts piped sgml format input from other sclite utilities.
* '-e' identifies the input character encoding.
- New alignment options:
* '-S' performs an inferred word segmentation algorithm rather
than using the word segmentation of the reference and hyp files.
* '-F' aligns fragments to words with matching substrings and scores
them as correct.
* Changed the -c option to include the optional flag "ASCIITOO"
which also splits ascii words when doing a character alignment.
Also added another flag, "DH", to delete hyphens from the ref and
hyp transcripts before alingment.
- Fixes and Changes:
* Modified the '-n' option to handle multiple hyp files.
* Fixed a bug in 'parse_stm_line' to handle empty texts.
* Modified the read function for a CTM file so that any length
file will be properly read in.
- Compiled and tested using the HP-UX and DEC OSF1 native cc
compilers.
sclite 1.2 - Released March 8, 1996
- Corrected a bug in the lur report that was activated if a speaker
had no reference words, but had errorneously hypothesized words.
- Added the sent, spk, and ovrdtl reports to sclite.
- Added the option to score CTM to CTM files. This is essentially
the same code used for the first SWB LVCSR evaluation, however, since
the new network alignment routines were used unifying the alignment
into a single step, alignments will differ slightly from those
generated with the old scoring package.
- Added the "-T" option to do time-mediated alignments.
- Removed the size limitations in the report generation software,
'rpg.c'. There is still are hard limit on the length of characters
for each cell of 200.
- Standardize program exit codes to be 0 for successfull execution
and 1 for failed execution.
- Correct the handling of NULL alternatives in the hypothesis file.
Scoring reference to hypothesis yields the same error rates as
scoring hypothesis to reference. The only difference is insertions
are swapped with deletions.
- The installer now has the option to enable or disable alignments
via GNU's diff.
- Added informative error messages when label definitions, which are
used by the 'lur' report, have been improperly specified.
sclite 1.2a - Released March 15, 1996
- Forgot one minor file in the distrubution, "sclite.c".
sclite 1.3 - Released April 22, 1996
- Corrected a minor makefile inconsistency. (One file was compiled
twice).
- Changed Network_dp_align to optionally include NULLS in the output.
- Changed the -m option to now reduce either the reference or
hypothesis file, or both before alignment takes place.
- fixed an uninitialized variable in alex.c which became apparent
in the 'dtl' and 'spk' reports.
- Corrected a argument passed to fill_STM_structure() in stm2ctm.c
which caused a warning on some compilers.
- Added a bug report proceedures.
Revision 1.4 - Released October 18, 1996
- Forced confidence values to flow through the entire data pipeline.
- Added the '-C' option to include 'normalized cross-entropy'
statistics in all output files.
- Added algo2 for the inferred segmentation option '-S'
- Added "IGNORE_TIME_SEGMENT_IN_SCORING" as an allowable
transcript for an stm record. See the stm file documentation for
it's use.
Revision 1.4a - Released May 29, 1997
- Cleaned the distribution to be ISO-9669 compatable
Released under a different name, sctk Version 1.0
- Modified the label extraction function 'parse_input_comment_line'
to ignore duplicate LABEL and CATEGORY lines.
- Added a sequence number to each PATH in alignment sequence so
that the input sequence of alignments can be reconstructed.
- Added the capability to keep track of reference confidence scores
when aligning ref ctm's against hyp ctm's.
- Corrected the .pre dump of the alignment structure when the case
sensitive flag is set. The error was introduced by modifications.
- Fixed a problem in TEXT_strcasecmp(). It failed to handle the
case where str1 was shorter than srt2.
- Fixed a problem in 'align.c/extract_speaker()' a NULL was not
terminating each newly extracted speaker id.
- Revised the reports lut, sum, snt, spkr,ovr to handle speakers W/o
any reference tokens, In the sum report, the speakers W/o
reference tokens are ignored when computing the speaker
mean, sd, and median.
- fixed a bug in tcslite.sh which output an error when test 5 was
run and the use of gnudiff was not compiled in to sclite.
- fixed a bug in config.in which was propagated to config.sh. The
problem was a missing backquote on "uname -s".
- Added error checking to the ctm2ctm alignment module. No checking
had been performed to make sure the ref and hyp files had the
same conversations and channels.
- Fixed a problem in 'expand_words_to_chars()' it was not deleting
hyphens from single character words do to an incorrect conditional.
- Added a new way to score, 'Optionally Deletable'. This required a
major set of modifications and generalizations.
- Modified the character scoring proceedure so that confidence scores
are imputed to the sub-characters making up the word.
- Corrected a bug in Compute_ROC:det.c which incorrectly incremented
pointers.
SCTK Version 1.1 - Released November 13, 1997
- Utility versions in this release: sclite V2.1, sc_stats V1.1
- added the Executive and Raw Executive Summaries to sc_stats.
- added the det curve to sc_stats so that combined plots are
produced.
- modified mapsswe test to handle arbitrary number of segments.
- Correct a bug in mtchprs.c which was free-ing a the test
confindence array prematurely.
SCTK Version 1.2
- added the prn report to sc_stats. Prints N-system alignments together.
- Added option alignment by word-weighted-mediated alignments.
- Weight inputs include wwl file (-w) and LM file (-L).
- Added testing scripts and documentation examples.
- Added the .wws output format.
- Update .prf output to include word weights and other information.
- Add SLM toolkit v2 into the sctk package.
- modified config.in, makefile.in and the installation process
- Various internal structures modified to handle word weights.
- Compiles under Linux using gmake.
- Documetation changes, including additional comments concerning the
waveform id in the STM and CTM file formats.
SCTK Version 1.2a
- Fixed an installation problem for Linux involving scfp.
SCTK Version 1.2b - Released October 1, 2000
- Improved testing code to not report errors under Linux
SCTK Version 1.2c - Released October 11, 2000
- Improved installation targets in makefile
SCTK Version 1.3 - Release July 30, 2004
- Minor bug fixes for core dumps
- Added the ability to pass two tags attached to each word through the
scorer. The tags are attached to the words by appending ';<string>'
to the word's text. There can be up to two tags, and they may be empty.
- Added a '#' after NCE values in the .sys reports to indicate the
abscence of reference lexemes for a speaker.
- Expanded the buffers in the rpg.c suite of routines for report generation.
- Expanded the maximum alternation size to 10000 characters.
- Added a "Lattice" error rate calculation in the .prn reports. It's the
percent of reference tokens not correct in any systems transcript.</pre>
</body>
</html>