README
3.94 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
___ _____ __ __ _____ ____ ___ _ _ ___ _
/ _ \_ _| \/ | ____| _ \_ _| / \ | | |_ _| / \
| | | || | | |\/| | _| | | | | | / _ \ | | | | / _ \
| |_| || | | | | | |___| |_| | | / ___ \ | |___ | | / ___ \
\___/ |_| |_| |_|_____|____/___/_/ \_\ |_____|___/_/ \_\
#---------------#
# OTMEDIA LIA #
# README #
# version 1.0 #
#---------------#
DESCRIPTION
-----------
OTMEDIA means "Observatoire Transmedia", its main objective is to study the evolution and transformation of the media world.
The scientific objective of the project is the creation of a new generation of media observatory
based on an interactive automatic analysis system (semi-automatic) transmedia to understand
the world of information and developments.
Web Site : http://www.otmedia.fr
OTMEDIA LIA project is a set of tools to transcribe radio and TV shows.
It does multiple things :
- First pass : default transcription with speeral and speaker diarization.
- Second pass : speaker adaptation and a second transcription pass with speeral.
- Confidence pass : calcul confidence measure from transcription output.
- Exploit Confidence Measure : use SOLR DB data to extend the lexicon on low confidence measure and create trigg files.
- Third pass : second pass using the new lexicon and trigg files.
DEPENDENCIES
------------
GNU Toolchain
Available from : http://www.gnu.org
and debian packages
Compiling, linking, and building applications.
avconv (libav-tools >= 0.8)
Available from : http://libav.org
and debian package
avconv is a very fast video and audio converter.
JAVA JDK and JRE ( >= 6)
Available from : http://www.oralce.com
and debian packages
JAVA Developpment kit and JAVA runtime environment.
Python ( >= 2.7.0)
Available from : http://http://www.python.org/
and debian packages
Python is a programming language.
Perl ( >= 5.0.0)
Available from : http://www.perl.org/
and debian packages
Perl is a programming language.
iconv ( >= 2.0.0)
Available from : http://www.gnu.org
and debian package
Character set conversion.
csh shell (csh)
Available on debian packages.
The C shell was originally written at UCB to overcome limitations in the
Bourne shell. Its flexibility and comfort (at that time) quickly made it
the shell of choice until more advanced shells like ksh, bash, zsh or
tcsh appeared. Most of the latter incorporate features original to csh
The SRI Language Modeling Toolkit (SRILM >= 1.6.0)
Available from : http://www.speech.sri.com/projects/srilm/
SRILM is a toolkit for building and applying statistical language models.
Tomcat ( >= 7.0.0)
Available from : http://tomcat.apache.org/
and debian packages
Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies.
INSTALLATION
------------
See the INSTALL file for the installation procedure.
Quick install below.
Before launching installation :
Be certain that all dependencies are satisfied.
Have 300 Go of free space for complet install.
Issue the following commands to the shell :
$> ./install.sh
$> export OTMEDIA_HOME=path/to/OTMEDIA/directory
Read SOLR.INSTALL part 3 to install SOLRDB.
RUNNING
-------
See HOWTO file.
ACKNOWLEDGEMENTS
----------------
Many thanks to Jean-François Rey for useful help and work done.
KNOWN BUGS
----------
Many.
For Bug report, please contact Pascal Nocera at pascal.nocera@univ-avignon.fr
COPYRIGHT
---------
See the COPYING file.
AUTHORS
-------
Jean-François Rey <jean-francois.rey@univ-avignon.fr>
Hugo Mauchrétien <hugo.mauchretien@univ-avignon.fr>
Emmanuel Ferreira <emmanuel.ferreira@univ-avignon.fr>