Ensemble.stats.unified
3.64 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
,------------------------------------------------------------------------------.
| Composite Report of All Significance Tests |
| For the Test |
| |
| Test Name Abbrev. |
| ------------------------------------------------------ ------- |
| Matched Pair Sentence Segment (Word Error) MP |
| Signed Paired Comparison (Speaker Word Error Rate (%)) SI |
| Wilcoxon Signed Rank (Speaker Word Error Rate (%)) WI |
| McNemar (Sentence Error) MN |
| |
| |
|------------------------------------------------------------------------------|
| Test || | lvc_hyp.ctm | lvc_hyp2.ctm || Test |
| Abbrev. || | | || Abbrev. |
|-----------++----------------+-------------+---------------++-----------------|
| MP || lvc_hyp.ctm | | ~ 1.000 || MP |
| SI || | | ~ 1.000 || SI |
| WI || | | ~ 1.000 || WI |
| MN || | | ~ 1.000 || MN |
|-----------++----------------+-------------+---------------++-----------------|
| MP || lvc_hyp2.ctm | | || MP |
| SI || | | || SI |
| WI || | | || WI |
| MN || | | || MN |
|------------------------------------------------------------------------------|
| These significance tests are all two-tailed tests with the null hypothesis |
| that there is no performance difference between the two systems. |
| |
| The first column indicates if the test finds a significant difference |
| at the level of p=0.05. It consists of '~' if no difference is |
| found at this significance level. If a difference at this level is |
| found, this column indicates the system with the higher value on the |
| performance statistic utilized by the particular test. |
| |
| The second column specifies the minimum value of p for which the test |
| finds a significant difference at the level of p. |
| |
| The third column indicates if the test finds a significant difference |
| at the level of p=0.001 ("***"), at the level of p=0.01, but not |
| p=0.001 ("**"), or at the level of p=0.05, but not p=0.01 ("*"). |
| |
| A test finds significance at level p if, assuming the null hypothesis, |
| the probability of the test statistic having a value at least as |
| extreme as that actually found, is no more than p. |
`------------------------------------------------------------------------------'