___ _____ __ __ _____ ____ ___ _ _ ___ _ / _ \_ _| \/ | ____| _ \_ _| / \ | | |_ _| / \ | | | || | | |\/| | _| | | | | | / _ \ | | | | / _ \ | |_| || | | | | | |___| |_| | | / ___ \ | |___ | | / ___ \ \___/ |_| |_| |_|_____|____/___/_/ \_\ |_____|___/_/ \_\ #---------------# # OTMEDIA LIA # # README # # version 1.0 # #---------------# DESCRIPTION ----------- OTMEDIA means "Observatoire Transmedia", its main objective is to study the evolution and transformation of the media world. The scientific objective of the project is the creation of a new generation of media observatory based on an interactive automatic analysis system (semi-automatic) transmedia to understand the world of information and developments. Web Site : http://www.otmedia.fr OTMEDIA LIA project is a set of tools to transcribe radio and TV shows. It does multiple things : - First pass : default transcription with speeral and speaker diarization. - Second pass : speaker adaptation and a second transcription pass with speeral. - Confidence pass : calcul confidence measure from transcription output. - Exploit Confidence Measure : use SOLR DB data to extend the lexicon on low confidence measure and create trigg files. - Third pass : second pass using the new lexicon and trigg files. From GIT : http://gitlia.univ-avignon.fr/jean-francois.rey/otmedia DEPENDENCIES ------------ GNU Toolchain Available from : http://www.gnu.org and debian packages Compiling, linking, and building applications. (g++ will be needed if you install scoring tools) avconv (libav-tools >= 0.8) Available from : http://libav.org and debian package avconv is a very fast video and audio converter. JAVA JDK and JRE ( >= 6) Available from : http://www.oralce.com and debian packages JAVA Developpment kit and JAVA runtime environment. Python ( >= 2.7.0) Available from : http://http://www.python.org/ and debian packages Python is a programming language. Perl ( >= 5.0.0) Available from : http://www.perl.org/ and debian packages Perl is a programming language. iconv ( >= 2.0.0) Available from : http://www.gnu.org and debian package Character set conversion. csh shell (csh) Available on debian packages. The C shell was originally written at UCB to overcome limitations in the Bourne shell. Its flexibility and comfort (at that time) quickly made it the shell of choice until more advanced shells like ksh, bash, zsh or tcsh appeared. Most of the latter incorporate features original to csh The SRI Language Modeling Toolkit (SRILM >= 1.6.0) Available from : http://www.speech.sri.com/projects/srilm/ SRILM is a toolkit for building and applying statistical language models. Tomcat ( >= 7.0.0) Available from : http://tomcat.apache.org/ and debian packages Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies. libxml2-dev ( >= 2.7 ) [needed for scoring only] Available from http://www.xmlsoft.org/ and debian packages Libxml2 is the XML C parser and toolkit. INSTALL ------- See the INSTALL file for the installation procedure. Quick install below. Before launching installation : Be certain that all dependencies are satisfied. Have 300 Go of free space for complet install. Issue the following commands to the shell : $> ./install.sh $> export OTMEDIA_HOME=path/to/OTMEDIA/directory Read SOLR.INSTALL part 3 to install SOLRDB. RUNNING ------- See HOWTO file. ACKNOWLEDGEMENTS ---------------- Many thanks to Jean-François Rey for useful help and work done. KNOWN BUGS ---------- Many. For Bug report, please contact Pascal Nocera at pascal.nocera@univ-avignon.fr COPYRIGHT --------- See the COPYING file. AUTHORS ------- Jean-François Rey Hugo Mauchrétien Emmanuel Ferreira