Commit 665a8dac322f0a4232d39c379136a945f4d76081
1 parent
b9a54507e8
Exists in
master
! follow the white rabbit !
Showing 6 changed files with 232 additions and 28 deletions Inline Diff
1 | #---------------# | ||
2 | # OTMEDIA LIA # | ||
3 | # HOWTO # | ||
4 | # version 1.0 # | ||
5 | #---------------# | ||
6 | |||
7 | 1\ Main options | ||
8 | --------------- | ||
9 | |||
10 | There are five main options for otmedia scripts. | ||
11 | -h : for help | ||
12 | -D : Debug mode | ||
13 | -v n : Verbose mode 1 low to 3 high | ||
14 | -c : Check results | ||
15 | -r : force to rerun a script, without deleting work already done | ||
16 | |||
17 | 2\ Main scripts | ||
18 | --------------- | ||
19 | 2.1\ FirstPass.sh | ||
20 | ----------------- | ||
21 | |||
22 | FirstPass.sh do speaker diarization and transcription of an audio file. Convert it into wav format if not already done (16000Hz, 16 bits, mono). | ||
23 | If a .SRT file is present in the same directory of the audio file it will copy it. | ||
24 | |||
25 | $> FisrtPass.sh [options] 110624FR2_20002100.wav result_directory | ||
26 | |||
27 | Options: | ||
28 | -f n : number of forks for speeral | ||
29 | |||
30 | Output : result_directory/110624FR2_20002100/res_p1/ | ||
31 | |||
32 | 2.2\ SecondPass.sh | ||
33 | ------------------ | ||
34 | |||
35 | SecondPass.sh do speaker adaptation and transcriptions base on the first pass. | ||
36 | |||
37 | $> SecondPass.sh [options] result_directory/110624FR2_20002100/ | ||
38 | |||
39 | Options: | ||
40 | -f n : number of forks for speeral | ||
41 | |||
42 | Output : result_directory/110624FR2_20002100/res_p2/ | ||
43 | |||
44 | 2.3\ ConfPass.sh | ||
45 | ---------------- | ||
46 | |||
47 | ConfPass.sh do confidence measure using the second or third pass. | ||
48 | |||
49 | $> Confpass.sh [options] result_directory/110624FR2_20002100/ <res_p2|res_p3> | ||
50 | |||
51 | Output : result_directory/110624FR2_20002100/conf/res_p2/scored_ctm/ | ||
52 | and result_directory/110624FR2_20002100.usf file | ||
53 | |||
54 | 2.4\ ExploitConfidencePass.sh | ||
55 | ----------------------------- | ||
56 | |||
57 | It exploits confidence pass measure to : | ||
58 | - boost confidente zone | ||
59 | - find alternative in non confidente zone (using SOLR DB) | ||
60 | - extend the lexicon | ||
61 | |||
62 | $> ExploitConfidencePass.sh [options] result_directory/110624FR2_20002100 | ||
63 | |||
64 | Output : result_directory/110624FR2_20002100/trigg/speeral | ||
65 | result_directory/110624FR2_20002100/LEX/speeral/_ext | ||
66 | |||
67 | 2.5\ ThirstPass.sh | ||
68 | ------------------ | ||
69 | |||
70 | ThirdPass.sh do transcriptions using SecondPass speaker adaptation and ExploitConfidencePass trigg files and new lexicon. | ||
71 | |||
72 | $> ThirdPass.sh [options] result_directory/110624FR2_20002100/ | ||
73 | |||
74 | Options : | ||
75 | -f n : number of forks for speeral | ||
76 | |||
77 | Output : result_directory/110624FR2_20002100/conf/res_p3 | ||
78 | |||
79 | 2.6\ RecomposePass.sh | ||
80 | -------------------- | ||
81 | |||
82 | RecomposePass.sh copy results that missing in ThirsPass from the Second and First Pass. | ||
83 | |||
84 | $> RecomposePass.sh [options] result_directory/110624FR2_20002100/ | ||
85 | |||
86 | Output : result_directory/110624FR2_20002100/res_all | ||
87 | |||
88 | 2.7\ ScoringRes.sh | ||
89 | ------------------ | ||
90 | |||
91 | ScoringRes.sh run differents scoring tools to score the results using SRT file if exists. | ||
92 | |||
93 | $> ScoringRes.sh [options] result_directory/110624FR2_20002100/ | ||
94 | |||
95 | Output : result_directory/110624FR2_20002100/scoring | ||
96 | |||
97 | 2.8\ CheckResults.sh | ||
98 | -------------------- | ||
99 | |||
100 | CheckResults.sh parse results directories to synthesize works already done. | ||
101 | |||
102 | $> CheckResults.sh [options] result_directory | ||
103 | |||
104 | Output : "Directory name #plp #res_p1 #treil_p2 #treil_p3 usf_p2 usf_p3" | ||
105 | #plp number of plp files | ||
106 | #res_p1 number of .res files at first pass | ||
107 | #treil_p2 number of .treil files at second pass | ||
108 | #treil_p3 number of .treil files at third pass | ||
109 | usf_p2 usf file from confidence pass result on second pass (OK|ERR|NAN) | ||
110 | usf_p3 usf file from confidence pass result on third pass (OK|ERR|NAN) | ||
111 | |||
112 | 3\ OneScriptToRuleThemAll.sh | ||
113 | ---------------------------- | ||
114 | |||
115 | The script to do all OTMEDIA LIA pass in one call. | ||
116 | |||
117 | $> OneScriptToRuleThemAll.sh [options] 110624FR2_20002100.wav result_directory | ||
118 | |||
119 | Options : (default options are availables) | ||
120 | -a Do every pass | ||
121 | -1 Do First pass | ||
122 | -2 Do Second pass | ||
123 | -3 Do Third pass | ||
124 | -C Do Confidence pass | ||
125 | -e Do Exploit Confidence pass | ||
126 | -R Do Recompose pass | ||
127 | -s Do Scoring pass | ||
128 | |||
129 |
INSTALL
File was created | 1 | #---------------# | |
2 | # OTMEDIA LIA # | ||
3 | # INSTALL # | ||
4 | # version : 1.0 # | ||
5 | #---------------# | ||
6 | |||
7 | OTMEDIA LIA ready to use ? Really ? | ||
8 | No ! You have to do manualy configuartion for some features. | ||
9 | Let see... | ||
10 | |||
11 | SUMMARY | ||
12 | ------- | ||
13 | |||
14 | 1\ Before installation | ||
15 | 2\ install.sh script | ||
16 | 3\ SOLR install | ||
17 | |||
18 | |||
19 | 1\ Before installation | ||
20 | ---------------------- | ||
21 | |||
22 | - Check and install dependencies. | ||
23 | - In 64 bits architcture be sure you can run 32 bits programs. | ||
24 | - Have 300 Go of free space. | ||
25 | - Have acces to the network and the nyx server. | ||
26 | |||
27 | 2/ install.sh script | ||
28 | -------------------- | ||
29 | |||
30 | install.sh script will do most of the work. | ||
31 | It will check dependencies and configure pass tools. | ||
32 | By default it will do a complet install (300 Go). | ||
33 | |||
34 | You can modifiy behavior by editing install.sh : | ||
35 | |||
36 | To disable lexicon adaption using SOLR DB put EXPLOITCONFPASS to 0 (mainly the 290 Go). | ||
37 | To disable confidence measure put CONFPASS to 0. | ||
38 | To disable second and third pass put PASS2 to 0. | ||
39 | |||
40 | run install.sh and follow the white rabbit. | ||
41 | |||
42 | 3\ SOLR install | ||
43 | --------------- | ||
44 | |||
45 | The install.sh script download otmedia-2013-04.tar.gz and untar it in OTMEDIA_HOME/tools/SOLR/ . | ||
46 | See SOLR.INSTALL file to install OTMEDIA SOLR DB. | ||
47 | |||
48 | |||
49 | |||
50 | |||
51 | |||
52 | |||
53 | |||
54 | |||
55 | |||
56 | |||
57 | |||
58 | |||
59 | |||
60 | |||
61 | |||
62 | |||
63 | |||
64 |
README
1 | ___ _____ __ __ _____ ____ ___ _ _ ___ _ | 1 | ___ _____ __ __ _____ ____ ___ _ _ ___ _ |
2 | / _ \_ _| \/ | ____| _ \_ _| / \ | | |_ _| / \ | 2 | / _ \_ _| \/ | ____| _ \_ _| / \ | | |_ _| / \ |
3 | | | | || | | |\/| | _| | | | | | / _ \ | | | | / _ \ | 3 | | | | || | | |\/| | _| | | | | | / _ \ | | | | / _ \ |
4 | | |_| || | | | | | |___| |_| | | / ___ \ | |___ | | / ___ \ | 4 | | |_| || | | | | | |___| |_| | | / ___ \ | |___ | | / ___ \ |
5 | \___/ |_| |_| |_|_____|____/___/_/ \_\ |_____|___/_/ \_\ | 5 | \___/ |_| |_| |_|_____|____/___/_/ \_\ |_____|___/_/ \_\ |
6 | 6 | ||
7 | 7 | ||
8 | #-------------------# | 8 | #---------------# |
9 | # OTMEDIA LIA # | 9 | # OTMEDIA LIA # |
10 | # README # | 10 | # README # |
11 | # version 1.0 # | 11 | # version 1.0 # |
12 | #-------------------# | 12 | #---------------# |
13 | 13 | ||
14 | DESCRIPTION | 14 | DESCRIPTION |
15 | ----------- | 15 | ----------- |
16 | 16 | ||
17 | OTMEDIA means "Observatoire Transmedia", its main objective is to study the evolution and transformation of the media world. | 17 | OTMEDIA means "Observatoire Transmedia", its main objective is to study the evolution and transformation of the media world. |
18 | The scientific objective of the project is the creation of a new generation of media observatory | 18 | The scientific objective of the project is the creation of a new generation of media observatory |
19 | based on an interactive automatic analysis system (semi-automatic) transmedia to understand | 19 | based on an interactive automatic analysis system (semi-automatic) transmedia to understand |
20 | the world of information and developments. | 20 | the world of information and developments. |
21 | 21 | ||
22 | Web Site : http://www.otmedia.fr | 22 | Web Site : http://www.otmedia.fr |
23 | 23 | ||
24 | OTMEDIA LIA project is a set of tools to transcribe radio and TV shows. | 24 | OTMEDIA LIA project is a set of tools to transcribe radio and TV shows. |
25 | It does multiple things : | ||
26 | - First pass : default transcription with speeral and speaker diarization. | ||
27 | - Second pass : speaker adaptation and a second transcription pass with speeral. | ||
28 | - Confidence pass : calcul confidence measure from transcription output. | ||
29 | - Exploit Confidence Measure : use SOLR DB data to extend the lexicon on low confidence measure and create trigg files. | ||
30 | - Third pass : second pass using the new lexicon and trigg files. | ||
31 | |||
25 | 32 | ||
26 | DEPENDENCIES | 33 | DEPENDENCIES |
27 | ------------ | 34 | ------------ |
28 | 35 | ||
29 | GNU Toolchain | 36 | GNU Toolchain |
30 | Available from : http://www.gnu.org | 37 | Available from : http://www.gnu.org |
31 | and debian packages | 38 | and debian packages |
32 | 39 | ||
33 | Compiling, linking, and building applications. | 40 | Compiling, linking, and building applications. |
34 | 41 | ||
35 | 42 | ||
36 | avconv (libav-tools >= 0.8) | 43 | avconv (libav-tools >= 0.8) |
37 | Available from : http://libav.org | 44 | Available from : http://libav.org |
38 | and debian package | 45 | and debian package |
39 | 46 | ||
40 | avconv is a very fast video and audio converter. | 47 | avconv is a very fast video and audio converter. |
41 | 48 | ||
42 | JAVA JDK and JRE ( >= 6) | 49 | JAVA JDK and JRE ( >= 6) |
43 | Available from : http://www.oralce.com | 50 | Available from : http://www.oralce.com |
44 | and debian packages | 51 | and debian packages |
45 | 52 | ||
46 | JAVA Developpment kit and JAVA runtime environment. | 53 | JAVA Developpment kit and JAVA runtime environment. |
47 | 54 | ||
48 | Python ( >= 2.7.0) | 55 | Python ( >= 2.7.0) |
49 | Available from : http://http://www.python.org/ | 56 | Available from : http://http://www.python.org/ |
50 | and debian packages | 57 | and debian packages |
51 | 58 | ||
52 | Python is a programming language. | 59 | Python is a programming language. |
53 | 60 | ||
54 | Perl ( >= 5.0.0) | 61 | Perl ( >= 5.0.0) |
55 | Available from : http://www.perl.org/ | 62 | Available from : http://www.perl.org/ |
56 | and debian packages | 63 | and debian packages |
57 | 64 | ||
58 | Perl is a programming language. | 65 | Perl is a programming language. |
59 | 66 | ||
60 | iconvi ( >= 2.0.0) | 67 | iconv ( >= 2.0.0) |
61 | Available from : http://www.gnu.org | 68 | Available from : http://www.gnu.org |
62 | and debian package | 69 | and debian package |
63 | 70 | ||
64 | Character set conversion. | 71 | Character set conversion. |
65 | 72 | ||
66 | csh shell (csh) | 73 | csh shell (csh) |
67 | Available on debian packages. | 74 | Available on debian packages. |
68 | 75 | ||
69 | The C shell was originally written at UCB to overcome limitations in the | 76 | The C shell was originally written at UCB to overcome limitations in the |
70 | Bourne shell. Its flexibility and comfort (at that time) quickly made it | 77 | Bourne shell. Its flexibility and comfort (at that time) quickly made it |
71 | the shell of choice until more advanced shells like ksh, bash, zsh or | 78 | the shell of choice until more advanced shells like ksh, bash, zsh or |
72 | tcsh appeared. Most of the latter incorporate features original to csh | 79 | tcsh appeared. Most of the latter incorporate features original to csh |
73 | 80 | ||
74 | The SRI Language Modeling Toolkit (SRILM >= 1.6.0) | 81 | The SRI Language Modeling Toolkit (SRILM >= 1.6.0) |
75 | Available from : http://www.speech.sri.com/projects/srilm/ | 82 | Available from : http://www.speech.sri.com/projects/srilm/ |
76 | 83 | ||
77 | SRILM is a toolkit for building and applying statistical language models. | 84 | SRILM is a toolkit for building and applying statistical language models. |
78 | 85 | ||
79 | Tomcat ( >= 7.0.0) | 86 | Tomcat ( >= 7.0.0) |
80 | Available from : http://tomcat.apache.org/ | 87 | Available from : http://tomcat.apache.org/ |
81 | and debian packages | 88 | and debian packages |
82 | 89 | ||
83 | Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies. | 90 | Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies. |
84 | 91 | ||
85 | INSTALLATION | 92 | INSTALLATION |
86 | ------------ | 93 | ------------ |
87 | 94 | ||
88 | See the INSTALL file for the installation procedure. | 95 | See the INSTALL file for the installation procedure. |
89 | 96 | ||
90 | Quick install below. | 97 | Quick install below. |
91 | 98 | ||
92 | Before launch installation : | 99 | Before launching installation : |
93 | 100 | ||
94 | Be certain that all dependencies are satisfied. | 101 | Be certain that all dependencies are satisfied. |
102 | Have 300 Go of free space for complet install. | ||
95 | 103 | ||
96 | Issue the following commands to the shell : | 104 | Issue the following commands to the shell : |
97 | $> ./install.sh | 105 | $> ./install.sh |
98 | $> export OTMEDIA_HOME=path/to/OTMEDIA/directory | 106 | $> export OTMEDIA_HOME=path/to/OTMEDIA/directory |
99 | 107 | ||
100 | Read SOLR.INSTALL part 3/ to install SOLRDB. | 108 | Read SOLR.INSTALL part 3 to install SOLRDB. |
101 | 109 | ||
102 | RUNNING | 110 | RUNNING |
103 | ------- | 111 | ------- |
104 | 112 | ||
105 | See HOWTO file. | 113 | See HOWTO file. |
106 | 114 | ||
107 | ACKNOWLEDGEMENTS | 115 | ACKNOWLEDGEMENTS |
108 | ---------------- | 116 | ---------------- |
109 | 117 | ||
110 | Many thanks to Jean-François Rey for useful help and work done. | 118 | Many thanks to Jean-François Rey for useful help and work done. |
111 | 119 | ||
112 | KNOWN BUGS | 120 | KNOWN BUGS |
113 | ---------- | 121 | ---------- |
114 | 122 | ||
115 | Many. | 123 | Many. |
124 | For Bug report, please contact Pascal Nocera at pascal.nocera@univ-avignon.fr | ||
116 | 125 | ||
117 | COPYRIGHT | 126 | COPYRIGHT |
118 | --------- | 127 | --------- |
119 | 128 | ||
120 | See the COPYING file. | 129 | See the COPYING file. |
121 | 130 | ||
122 | AUTHORS | 131 | AUTHORS |
123 | ------- | 132 | ------- |
124 | 133 | ||
125 | Jean-François Rey <jean-francois.rey@univ-avignon.fr> | 134 | Jean-François Rey <jean-francois.rey@univ-avignon.fr> |
126 | Hugo Mauchrétien <hugo.mauchretien@univ-avignon.fr> | 135 | Hugo Mauchrétien <hugo.mauchretien@univ-avignon.fr> |
127 | Emmanuel Ferreira <emmanuel.ferreira@univ-avignon.fr> | 136 | Emmanuel Ferreira <emmanuel.ferreira@univ-avignon.fr> |
128 | 137 | ||
129 | 138 |
SOLR.INSTALL
1 | ################ | 1 | ################ |
2 | # SOLR INSTALL # | 2 | # SOLR INSTALL # |
3 | ################ | 3 | ################ |
4 | # | 4 | # |
5 | # Author Jean-François Rey | 5 | # Author Jean-François Rey |
6 | # Version : 1.0 | 6 | # Version : 1.0 |
7 | # Date : 18/07/2013 | 7 | # Date : 18/07/2013 |
8 | # | 8 | # |
9 | 9 | ||
10 | 1/ Edit install.sh and put CONFPASS=1 | 10 | 1/ Edit install.sh and put CONFPASS=1 |
11 | 11 | ||
12 | 2/ Run install.sh, this will check tomcat is installed, download and untar otmedia SOLR DB and ask for solr service info. | 12 | 2/ Run install.sh, this will check tomcat is installed, download and untar otmedia SOLR DB and ask for solr service info. |
13 | 13 | ||
14 | 3/ Configure Tomcat and SOLR | 14 | 3/ Configure Tomcat and SOLR |
15 | 15 | ||
16 | SOLR_OTMEDIA_PATH=OTMEDIA_PATH/tools/SOLR/otemdia-2013-04 | 16 | otmedia-2013-04 SOLR DB is untar in : |
17 | SOLR_OTMEDIA_PATH=OTMEDIA_HOME/tools/SOLR/otemdia-2013-04 | ||
17 | 18 | ||
18 | 3.1/ Set context file | 19 | 3.1/ Set context file |
19 | ---------------- | 20 | ---------------- |
20 | 21 | ||
21 | - in SOLR_OTMEDIA_PATH/solr/otmedia-document/solr-tomcat-deploy/solr-otmedia-document.xml | 22 | - in SOLR_OTMEDIA_PATH/solr/otmedia-document/solr-tomcat-deploy/solr-otmedia-document.xml |
22 | change DocBase to DocBase="SOLR_OTMEDIA_PATH/solr/otmedia-document/apache-solr-3.5.0.war" | 23 | change DocBase to DocBase="SOLR_OTMEDIA_PATH/solr/otmedia-document/apache-solr-3.5.0.war" |
23 | and value to value="SOLR_OTMEDIA_PATH/solr/otmedia-document/" | 24 | and value to value="SOLR_OTMEDIA_PATH/solr/otmedia-document/" |
24 | 25 | ||
25 | - in SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/solr-tomcat-deploy/solr-otmedia-multimedia.xml | 26 | - in SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/solr-tomcat-deploy/solr-otmedia-multimedia.xml |
26 | change DocBase to DocBase="SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/apache-solr-3.5.0.war" | 27 | change DocBase to DocBase="SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/apache-solr-3.5.0.war" |
27 | and value to value="SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/" | 28 | and value to value="SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/" |
28 | 29 | ||
29 | 3.2/ SOLR data configuration | 30 | 3.2/ SOLR data configuration |
30 | ----------------------- | 31 | ----------------------- |
31 | 32 | ||
32 | - in SOLR_OTMEDIA_PATH/solr/otmedia-document/conf/solrconfig.xml | 33 | - in SOLR_OTMEDIA_PATH/solr/otmedia-document/conf/solrconfig.xml |
33 | change datadir (solr.data.dir) to SOLR_OTMEDIA_PATH/index/otmedia-document/ | 34 | change datadir (solr.data.dir) to SOLR_OTMEDIA_PATH/index/otmedia-document/ |
34 | 35 | ||
35 | - in SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/conf/solrconfig.xml | 36 | - in SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/conf/solrconfig.xml |
36 | change datadir (solr.data.dir) to SOLR_OTMEDIA_PATH/index/otmedia-multimedia/ | 37 | change datadir (solr.data.dir) to SOLR_OTMEDIA_PATH/index/otmedia-multimedia/ |
37 | 38 | ||
38 | 3.3/ Add SOLR DB to Tomcat | 39 | 3.3/ Add SOLR DB to Tomcat |
39 | --------------------- | 40 | --------------------- |
40 | 41 | ||
41 | - in tomcat/Catalina/localhost/ (mainly in /etc/tomcat/Catalina/localhost or /var/lib/tomcat/conf/Catalina/localhost) | 42 | - in tomcat/Catalina/localhost/ (mainly in /etc/tomcat/Catalina/localhost or /var/lib/tomcat/conf/Catalina/localhost) |
42 | run : $> ln -s SOLR_OTMEDIA_PATH/solr/otmedia-document/solr-tomcat-deploy/solr-otmedia-document.xml solr-otmedia-document.xml | 43 | run : $> ln -s SOLR_OTMEDIA_PATH/solr/otmedia-document/solr-tomcat-deploy/solr-otmedia-document.xml solr-otmedia-document.xml |
43 | run : $> ln -s SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/solr-tomcat-deploy/solr-otmedia-multimedia.xml solr-otmedia-document.xml | 44 | run : $> ln -s SOLR_OTMEDIA_PATH/solr/otmedia-multimedia/solr-tomcat-deploy/solr-otmedia-multimedia.xml solr-otmedia-document.xml |
44 | 45 | ||
45 | 4/ Tomcat trouble | 46 | 4/ Tomcat trouble |
46 | 47 | ||
47 | 4.1/ SOLR use a lot of memory, you need to increase java heap space ! | 48 | 4.1/ SOLR use a lot of memory, you need to increase java heap space ! |
48 | ------------------------- | 49 | ------------------------- |
49 | 50 | ||
50 | - in catalina.sh (/usr/share/tomcat/bin) | 51 | - in catalina.sh (/usr/share/tomcat/bin) |
51 | add CATALINA_OPTS="$CATALINA_OPTS -Xms256 -Xmx512m" | 52 | add CATALINA_OPTS="$CATALINA_OPTS -Xms256 -Xmx512m" |
52 | 53 | ||
53 | 4.2/ Directory permissions | 54 | 4.2/ Directory permissions |
54 | --------------------- | 55 | --------------------- |
55 | 56 | ||
56 | - SOLR_OTMEDIA_PATH and subdirectory (and files) need to belong to tomcat group (and tomcat user if the default user don't belong to tomcat group). | 57 | - SOLR_OTMEDIA_PATH and subdirectory (and files) need to belong to tomcat group (and tomcat user if the default user don't belong to tomcat group). |
57 | chgrp -r tomcat7 otmedia-2013-04 | 58 | chgrp -r tomcat7 otmedia-2013-04 |
58 | chmod g+rx otmedia-2013-04 | 59 | chmod g+rx otmedia-2013-04 |
59 | 60 | ||
60 | 5/ Test | 61 | 5/ Test |
61 | 62 | ||
62 | You can test those requests (change ip and port): | 63 | You can test those requests (change ip and port): |
63 | http://localhost:8080/solr-otmedia-multimedia/select?q=test+bonus+&fq=docDate:[2011-12-30T00\:00\:01Z+TO+2012-01-01T23\:59\:59Z] | 64 | http://localhost:8080/solr-otmedia-multimedia/select?q=test+bonus+&fq=docDate:[2011-12-30T00\:00\:01Z+TO+2012-01-01T23\:59\:59Z] |
64 | http://localhost:8080/solr-otmedia-document/select?q=test+bonus+&fq=docDate:[2011-12-30T00\:00\:01Z+TO+2012-01-01T23\:59\:59Z] | 65 | http://localhost:8080/solr-otmedia-document/select?q=test+bonus+&fq=docDate:[2011-12-30T00\:00\:01Z+TO+2012-01-01T23\:59\:59Z] |
65 | 66 | ||
66 | 67 |
install.sh
1 | #!/bin/bash | 1 | #!/bin/bash |
2 | 2 | ||
3 | #-------------------# | 3 | #-------------------# |
4 | # OTMEDIA LIA # | ||
4 | # Install script # | 5 | # Install script # |
5 | # OTMEDIA # | 6 | # version : 1.0.0 # |
6 | #-------------------# | 7 | #-------------------# |
7 | 8 | ||
8 | # Color variables | 9 | # Color variables |
9 | txtgrn=$(tput setaf 2) # Green | 10 | txtgrn=$(tput setaf 2) # Green |
10 | txtylw=$(tput setaf 3) # Yellow | 11 | txtylw=$(tput setaf 3) # Yellow |
11 | txtblu=$(tput setaf 4) # Blue | 12 | txtblu=$(tput setaf 4) # Blue |
12 | txtpur=$(tput setaf 5) # Purple | 13 | txtpur=$(tput setaf 5) # Purple |
13 | txtcyn=$(tput setaf 6) # Cyan | 14 | txtcyn=$(tput setaf 6) # Cyan |
14 | txtwht=$(tput setaf 7) # White | 15 | txtwht=$(tput setaf 7) # White |
15 | txtrst=$(tput sgr0) # Text reset. | 16 | txtrst=$(tput sgr0) # Text reset. |
16 | #/color | 17 | #/color |
17 | 18 | ||
18 | # | 19 | # |
19 | ### Global Variables | 20 | ### Global Variables |
20 | # | 21 | # |
21 | PWD=$(pwd) | 22 | PWD=$(pwd) |
22 | OTMEDIA_HOME=$PWD | 23 | OTMEDIA_HOME=$PWD |
23 | test=$(arch) | 24 | test=$(arch) |
24 | if [ "$test" == "x86_64" ]; then ARCH=".64"; else ARCH=""; fi | 25 | if [ "$test" == "x86_64" ]; then ARCH=".64"; else ARCH=""; fi |
25 | #/Global | 26 | #/Global |
26 | 27 | ||
27 | 28 | ||
28 | # | 29 | # |
29 | # Put to 0 to disable dependencies of a pass | 30 | # Put to 0 to disable dependencies of a pass |
30 | # and 1 to enable | 31 | # and 1 to enable |
31 | # | 32 | # |
32 | PASS1=1 # First Pass | 33 | PASS1=1 # First Pass |
33 | PASS2=1 # Second Pass | 34 | PASS2=1 # Second and Third Pass |
34 | CONFPASS=1 # Confidence Pass | 35 | CONFPASS=1 # Confidence Pass |
35 | EXPLOITCONFPASS=1 # SOLR query and trigg | 36 | EXPLOITCONFPASS=1 # SOLR query and trigg |
36 | 37 | ||
37 | echo -e "\nWill do install for :" | 38 | echo -e "\nWill do install for :" |
38 | if [ $PASS1 -eq 1 ];then echo "- Pass 1";fi | 39 | if [ $PASS1 -eq 1 ];then echo "- Pass 1";fi |
39 | if [ $PASS2 -eq 1 ];then echo "- Pass 2";fi | 40 | if [ $PASS2 -eq 1 ];then echo "- Pass 2";fi |
40 | if [ $CONFPASS -eq 1 ];then echo "- Confidence Pass";fi | 41 | if [ $CONFPASS -eq 1 ];then echo "- Confidence Pass";fi |
41 | if [ $EXPLOITCONFPASS -eq 1 ];then echo "- Exploit Confidence Pass";fi | 42 | if [ $EXPLOITCONFPASS -eq 1 ];then echo "- Exploit Confidence Pass";fi |
42 | 43 | ||
43 | # | 44 | # |
44 | ### CHECK Dependencies ### | 45 | ### CHECK Dependencies ### |
45 | # | 46 | # |
46 | echo -e "\n\t${txtblu}Check Dependencies${txtrst}\n" | 47 | echo -e "\n\t${txtblu}Check Dependencies${txtrst}\n" |
47 | 48 | ||
48 | ## make | 49 | ## make |
49 | test=$(whereis make) | 50 | test=$(whereis make) |
50 | if [ "$test" == "make:" ] | 51 | if [ "$test" == "make:" ] |
51 | then | 52 | then |
52 | echo -e "${txtpur}ERROR${txtrst} make not found\n You have to install make\n sudo apt-get install make" | 53 | echo -e "${txtpur}ERROR${txtrst} make not found\n You have to install make\n sudo apt-get install make" |
53 | exit 1; | 54 | exit 1; |
54 | fi | 55 | fi |
55 | echo -e "make \t ${txtgrn}OK${txtrst}" | 56 | echo -e "make \t ${txtgrn}OK${txtrst}" |
56 | 57 | ||
57 | ## CC | 58 | ## CC |
58 | test=$(whereis cc) | 59 | test=$(whereis cc) |
59 | if [ "$test" == "cc:" ] | 60 | if [ "$test" == "cc:" ] |
60 | then | 61 | then |
61 | echo -e "${txtpur}ERROR${txtrst} cc not found\n You have to install cc\n sudo apt-get install gcc" | 62 | echo -e "${txtpur}ERROR${txtrst} cc not found\n You have to install cc\n sudo apt-get install gcc" |
62 | exit 1; | 63 | exit 1; |
63 | fi | 64 | fi |
64 | echo -e "cc \t ${txtgrn}OK${txtrst}" | 65 | echo -e "cc \t ${txtgrn}OK${txtrst}" |
65 | 66 | ||
66 | ## AVCONV | 67 | ## AVCONV |
67 | test=$(whereis avconv) | 68 | test=$(whereis avconv) |
68 | if [ "$test" == "avconv:" ] | 69 | if [ "$test" == "avconv:" ] |
69 | then | 70 | then |
70 | echo -e "${txtpur}ERROR${txtrst} avconv not found\n You have to install avconv\n sudo apt-get install libav-tools" | 71 | echo -e "${txtpur}ERROR${txtrst} avconv not found\n You have to install avconv\n sudo apt-get install libav-tools" |
71 | exit 1; | 72 | exit 1; |
72 | fi | 73 | fi |
73 | echo -e "libav-tools : avconv \t ${txtgrn}OK${txtrst}" | 74 | echo -e "libav-tools : avconv \t ${txtgrn}OK${txtrst}" |
74 | 75 | ||
75 | ## JAVA | 76 | ## JAVA |
76 | test=$(whereis java) | 77 | test=$(whereis java) |
77 | if [ "$test" == "java:" ] | 78 | if [ "$test" == "java:" ] |
78 | then | 79 | then |
79 | echo -e "${txtpur}ERROR${txtrst} java not found\n You have to install java JRE\n sudo apt-get install openjdk-7-jre" | 80 | echo -e "${txtpur}ERROR${txtrst} java not found\n You have to install java JRE\n sudo apt-get install openjdk-7-jre" |
80 | exit 1; | 81 | exit 1; |
81 | fi | 82 | fi |
82 | echo -e "Java : JRE \t ${txtgrn}OK${txtrst}" | 83 | echo -e "Java : JRE \t ${txtgrn}OK${txtrst}" |
83 | test=$(whereis javac) | 84 | test=$(whereis javac) |
84 | if [ "$test" == "javac:" ] | 85 | if [ "$test" == "javac:" ] |
85 | then | 86 | then |
86 | echo -e "${txtpur}ERROR${txtrst} javac not found\n You have to install java JDK\n sudo apt-get install openjdk-7-jdk" | 87 | echo -e "${txtpur}ERROR${txtrst} javac not found\n You have to install java JDK\n sudo apt-get install openjdk-7-jdk" |
87 | exit 1; | 88 | exit 1; |
88 | fi | 89 | fi |
89 | echo -e "Java : JDK \t ${txtgrn}OK${txtrst}" | 90 | echo -e "Java : JDK \t ${txtgrn}OK${txtrst}" |
90 | 91 | ||
91 | if [ $EXPLOITCONFPASS -eq 1 ] | 92 | if [ $EXPLOITCONFPASS -eq 1 ] |
92 | then | 93 | then |
93 | ## Python | 94 | ## Python |
94 | test=$(whereis python) | 95 | test=$(whereis python) |
95 | if [ "$test" == "python:" ] | 96 | if [ "$test" == "python:" ] |
96 | then | 97 | then |
97 | echo -e "${txtpur}ERROR${txtrst} python not found\n You have to install python\n sudo apt-get install python" | 98 | echo -e "${txtpur}ERROR${txtrst} python not found\n You have to install python\n sudo apt-get install python" |
98 | exit 1; | 99 | exit 1; |
99 | fi | 100 | fi |
100 | echo -e "python : \t ${txtgrn}OK${txtrst}" | 101 | echo -e "python : \t ${txtgrn}OK${txtrst}" |
102 | |||
103 | ## csh shell | ||
104 | test=$(whereis csh) | ||
105 | if [ "$test" == "csh:" ] | ||
106 | then | ||
107 | echo -e "${txtpur}ERROR${txtrst} csh shell not found\n You have to install csh shell\n sudo apt-get install csh" | ||
108 | exit 1; | ||
109 | fi | ||
110 | echo -e "csh shell : \t ${txtgrn}OK${txtrst}" | ||
101 | fi | 111 | fi |
102 | 112 | ||
103 | ## Perl | 113 | ## Perl |
104 | test=$(whereis perl) | 114 | test=$(whereis perl) |
105 | if [ "$test" == "perl:" ] | 115 | if [ "$test" == "perl:" ] |
106 | then | 116 | then |
107 | echo -e "${txtpur}ERROR${txtrst} perl not found\n You have to install perl\n sudo apt-get install perl" | 117 | echo -e "${txtpur}ERROR${txtrst} perl not found\n You have to install perl\n sudo apt-get install perl" |
108 | exit 1; | 118 | exit 1; |
109 | fi | 119 | fi |
110 | echo -e "perl : \t ${txtgrn}OK${txtrst}" | 120 | echo -e "perl : \t ${txtgrn}OK${txtrst}" |
111 | 121 | ||
112 | ## iconv | 122 | ## iconv |
113 | test=$(whereis iconv) | 123 | test=$(whereis iconv) |
114 | if [ "$test" == "iconv:" ] | 124 | if [ "$test" == "iconv:" ] |
115 | then | 125 | then |
116 | echo -e "${txtpur}ERROR${txtrst} iconv not found\n You have to install iconv\n sudo apt-cache search iconv" | 126 | echo -e "${txtpur}ERROR${txtrst} iconv not found\n You have to install iconv\n sudo apt-cache search iconv" |
117 | exit 1; | 127 | exit 1; |
118 | fi | 128 | fi |
119 | echo -e "iconv : \t ${txtgrn}OK${txtrst}" | 129 | echo -e "iconv : \t ${txtgrn}OK${txtrst}" |
120 | 130 | ||
121 | ## csh shell | ||
122 | test=$(whereis csh) | ||
123 | if [ "$test" == "csh:" ] | ||
124 | then | ||
125 | echo -e "${txtpur}ERROR${txtrst} csh shell not found\n You have to install csh shell\n sudo apt-get install csh" | ||
126 | exit 1; | ||
127 | fi | ||
128 | echo -e "csh shell : \t ${txtgrn}OK${txtrst}" | ||
129 | |||
130 | ## SRI LM | 131 | ## SRI LM |
131 | if [ -z "$SRILM" ] && [ -z "$MACHINE_TYPE" ] | 132 | if [ -z "$SRILM" ] && [ -z "$MACHINE_TYPE" ] |
132 | then | 133 | then |
133 | echo -e "${txtpur}ERROR${txtrst} SRILM toolkit variables are not defined (SRILM and MACHINE_TYPE)\n You have to install SRILM Toolkit\n" | 134 | echo -e "${txtpur}ERROR${txtrst} SRILM toolkit variables are not defined (SRILM and MACHINE_TYPE)\n You have to install SRILM Toolkit\n" |
134 | exit 1; | 135 | exit 1; |
135 | fi | 136 | fi |
136 | export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE | 137 | export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE |
137 | echo -e "SRILM toolkit : \t ${txtgrn}OK${txtrst}" | 138 | echo -e "SRILM toolkit : \t ${txtgrn}OK${txtrst}" |
138 | 139 | ||
139 | |||
140 | |||
141 | ### Speeral Configuration ### | 140 | ### Speeral Configuration ### |
142 | 141 | ||
143 | echo -e "\n\t${txtblu}Speeral configuration${txtrst}\n" | 142 | echo -e "\n\t${txtblu}Speeral configuration${txtrst}\n" |
144 | echo -e "Download Speeral bin and data :" | 143 | echo -e "Download Speeral bin and data :" |
145 | scp -r rey@nyx:~/OTMEDIA_DATA/Speeral $OTMEDIA_HOME/tools/ | 144 | scp -r rey@nyx:~/OTMEDIA_DATA/Speeral $OTMEDIA_HOME/tools/ |
146 | echo -e "\n\t${txtblu}Generating Speeral configuration files :${txtrst}\n" | 145 | echo -e "\n\t${txtblu}Generating Speeral configuration files :${txtrst}\n" |
147 | cat $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ | 146 | cat $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ |
148 | | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ | 147 | | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ |
149 | | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ | 148 | | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ |
150 | > $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml | 149 | > $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml |
151 | echo $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml | 150 | echo $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml |
152 | cat $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ | 151 | cat $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ |
153 | | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ | 152 | | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ |
154 | | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ | 153 | | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ |
155 | > $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml | 154 | > $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml |
156 | echo $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml | 155 | echo $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml |
157 | cat $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ | 156 | cat $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ |
158 | | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ | 157 | | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ |
159 | | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ | 158 | | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ |
160 | > $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml | 159 | > $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml |
161 | echo $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml | 160 | echo $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml |
162 | 161 | ||
163 | 162 | ||
164 | if [ $EXPLOITCONFPASS -eq 1 ] | 163 | if [ $EXPLOITCONFPASS -eq 1 ] |
165 | then | 164 | then |
166 | ### LIA ltbox ### | 165 | ### LIA ltbox ### |
167 | echo -e "\t${txtblu}Install lia_ltbox${txtrst}\n" | 166 | echo -e "\t${txtblu}Install lia_ltbox${txtrst}\n" |
168 | export LIA_TAGG_LANG="french" | 167 | export LIA_TAGG_LANG="french" |
169 | export LIA_TAGG="$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" | 168 | export LIA_TAGG="$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" |
170 | export LIA_PHON_REP="$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" | 169 | export LIA_PHON_REP="$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" |
171 | export LIA_BIGLEX="$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" | 170 | export LIA_BIGLEX="$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" |
172 | 171 | ||
173 | ### config lia_phon | 172 | ### config lia_phon |
174 | cd $LIA_PHON_REP | 173 | cd $LIA_PHON_REP |
175 | make all > /dev/null | 174 | make all > /dev/null |
176 | make ressource > /dev/null | 175 | make ressource > /dev/null |
177 | ### config lia_tagg | 176 | ### config lia_tagg |
178 | cd $LIA_TAGG | 177 | cd $LIA_TAGG |
179 | make all > /dev/null | 178 | make all > /dev/null |
180 | make ressource.french > /dev/null | 179 | make ressource.french > /dev/null |
181 | ### config lia_biglex | 180 | ### config lia_biglex |
182 | cd $LIA_BIGLEX | 181 | cd $LIA_BIGLEX |
183 | make -f makefile.biglex > /dev/null | 182 | make -f makefile.biglex > /dev/null |
184 | cd $OTMEDIA_HOME | 183 | cd $OTMEDIA_HOME |
185 | 184 | ||
186 | 185 | ||
187 | ### SOLR DB ### | 186 | ### SOLR DB ### |
188 | # Tomcat fisrtly | 187 | # Tomcat fisrtly |
189 | test=$(dpkg -l | grep "^ii" | grep tomcat) | 188 | test=$(dpkg -l | grep "^ii" | grep tomcat) |
190 | if [ "$test" == "" ] | 189 | if [ "$test" == "" ] |
191 | then | 190 | then |
192 | echo -e "${txtpur}ERROR${txtrst} TOMCAT seems to not be installed)\n You have to install TOMCAT\n" | 191 | echo -e "${txtpur}ERROR${txtrst} TOMCAT seems to not be installed)\n You have to install TOMCAT\n" |
193 | exit 1; | 192 | #exit 1; |
194 | fi | 193 | fi |
195 | echo -e "\nTOMCAT : \t ${txtgrn}OK${txtrst}\n" | 194 | echo -e "\nTOMCAT : \t ${txtgrn}OK${txtrst}\n" |
196 | # SOLR secondly | 195 | # SOLR secondly |
197 | echo -e "\t${txtblu}Install SOLR DB${txtrst}\n" | 196 | echo -e "\t${txtblu}Install SOLR DB${txtrst}\n" |
198 | echo -e "You will need 300 Go of free space to install SOLR DB" | 197 | echo -e "You will need 300 Go of free space to install SOLR DB" |
199 | read -p "Continue ? (y/n) " solr | 198 | read -p "Continue ? (y/n) " solr |
200 | if [ "$solr" == "y" ] | 199 | if [ "$solr" == "y" ] |
201 | then | 200 | then |
202 | 201 | ||
203 | echo -e "Download SOLR DB\r" | 202 | echo -e "Download SOLR DB\r" |
204 | mkdir -p $OTMEDIA_HOME/tools/SOLR 2> /dev/null | 203 | mkdir -p $OTMEDIA_HOME/tools/SOLR 2> /dev/null |
205 | scp -r rey@nyx:~/OTMEDIA_DATA/SOLR/otmedia-2013-04.tar.gz $OTMEDIA_HOME/tools/SOLR | 204 | scp -r rey@nyx:~/OTMEDIA_DATA/SOLR/otmedia-2013-04.tar.gz $OTMEDIA_HOME/tools/SOLR |
206 | echo -e "Unzip SOLR DB\r" | 205 | echo -e "Unzip SOLR DB\r" |
207 | res=0 | 206 | res=0 |
208 | #res = $(tar -xvzf "$OTMEDIA_HOME/tools/SOLR/otmedia-2013-04.tar.gz" "$OTMEDIA_HOME/tools/SOLR/") | 207 | #res = $(tar -xvzf "$OTMEDIA_HOME/tools/SOLR/otmedia-2013-04.tar.gz" "$OTMEDIA_HOME/tools/SOLR/") |
209 | if [ $res -eq 2 ]; then echo " ${txtpur}NOT OK${txtrst}"; | 208 | if [ $res -eq 2 ]; then echo " ${txtpur}NOT OK${txtrst}"; |
210 | else echo " ${txtgrn}OK${txtrst}"; fi | 209 | else echo " ${txtgrn}OK${txtrst}"; fi |
211 | else | 210 | else |
212 | echo "Skipping SOLR install" | 211 | echo "Skipping SOLR install" |
213 | fi | 212 | fi |
214 | read -e -p "Configure SOLR DB server ? (y/n) " solr | 213 | read -e -p "Configure SOLR DB server ? (y/n) " solr |
215 | if [ "$solr" == "y" ] | 214 | if [ "$solr" == "y" ] |
216 | then | 215 | then |
217 | read -p "Enter SOLR server IP :" ip | 216 | read -p "Enter SOLR server IP :" ip |
218 | if [ "${ip}" == "" ];then ip="localhost";fi | 217 | if [ "${ip}" == "" ];then ip="localhost";fi |
219 | echo "machine = \"${ip}\"" > $OTMEDIA_HOME/tools/scripts/solrinfo.py | 218 | echo "machine = \"${ip}\"" > $OTMEDIA_HOME/tools/scripts/solrinfo.py |
220 | read -p "Enter SOLR server port :" port | 219 | read -p "Enter SOLR server port :" port |
221 | if [ "${port}" == "" ]; then port="8080";fi | 220 | if [ "${port}" == "" ]; then port="8080";fi |
222 | echo -e "\n\tSOLR server IP ${ip}" | 221 | echo -e "\n\tSOLR server IP ${ip}" |
223 | echo -e "\tSOLR server port ${port}" | 222 | echo -e "\tSOLR server port ${port}" |
224 | echo "port = \"${port}\"" >> $OTMEDIA_HOME/tools/scripts/solrinfo.py | 223 | echo "port = \"${port}\"" >> $OTMEDIA_HOME/tools/scripts/solrinfo.py |
225 | else | 224 | else |
226 | echo "Skipping SOLR DB Configuration" | 225 | echo "Skipping SOLR DB Configuration" |
227 | fi | 226 | fi |
228 | echo -e "\nSee SOLR.INSTALL file for more information\n" | 227 | echo -e "\nSee SOLR.INSTALL file for more information\n" |
229 | fi | 228 | fi |
230 | 229 | ||
231 | ### Set Variables in bashrc ### | 230 | ### Set Variables in bashrc ### |
232 | cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" > ~/.bashrc.org | 231 | cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" > ~/.bashrc.org |
233 | #cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" | grep -v "LIA_TAGG" | grep -v "LIA_PHON" | grep -v "LIA_BIGLEX" > ~/.bashrc.org | 232 | #cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" | grep -v "LIA_TAGG" | grep -v "LIA_PHON" | grep -v "LIA_BIGLEX" > ~/.bashrc.org |
234 | cp ~/.bashrc.org ~/.bashrc | 233 | cp ~/.bashrc.org ~/.bashrc |
235 | export OTMEDIA_HOME=$PWD | 234 | export OTMEDIA_HOME=$PWD |
236 | echo "export OTMEDIA_HOME=$PWD" >> ~/.bashrc | 235 | echo "export OTMEDIA_HOME=$PWD" >> ~/.bashrc |
236 | echo "export $PATH=$PATH:$PWD/main_tools" >> ~/.bashrc | ||
237 | echo "export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE" >> ~/.bashrc | 237 | echo "export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE" >> ~/.bashrc |
238 | #echo "export LIA_TAGG_LANG=french" >> ~/.bashrc | 238 | #echo "export LIA_TAGG_LANG=french" >> ~/.bashrc |
239 | #echo "export LIA_TAGG=$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" >> ~/.bashrc | 239 | #echo "export LIA_TAGG=$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" >> ~/.bashrc |
240 | #echo "export LIA_PHON_REP=$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" >> ~/.bashrc | 240 | #echo "export LIA_PHON_REP=$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" >> ~/.bashrc |
241 | #echo "export LIA_BIGLEX=$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" >> ~/.bashrc | 241 | #echo "export LIA_BIGLEX=$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" >> ~/.bashrc |
242 | 242 | ||
243 | # set global configuration file | 243 | # set global configuration file |
244 | echo "OTMEDIA_HOME=$PWD" > $OTMEDIA_HOME/cfg/main_cfg.cfg | 244 | echo "OTMEDIA_HOME=$PWD" > $OTMEDIA_HOME/cfg/main_cfg.cfg |
245 | echo "ARCH=$ARCH" >> $OTMEDIA_HOME/cfg/main_cfg.cfg | 245 | echo "ARCH=$ARCH" >> $OTMEDIA_HOME/cfg/main_cfg.cfg |
246 | echo "VERBOSE=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg | 246 | echo "VERBOSE=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg |
247 | echo "DEBUG=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg | 247 | echo "DEBUG=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg |
248 | echo "CHECK=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg | 248 | echo "CHECK=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg |
249 | echo "RERUN=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg | 249 | echo "RERUN=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg |
250 | 250 | ||
251 | echo -e "\n\t${txtgrn}### Install completed ###${txtrst}\n" | 251 | echo -e "\n\t${txtgrn}### Install completed ###${txtrst}\n" |
252 | echo -e "do : source ~/.bashrc" | 252 | echo -e "do : source ~/.bashrc" |
253 | echo -e "or set variable :\n" | 253 | echo -e "or set variable :\n" |
254 | echo "export OTMEDIA_HOME=$PWD" | 254 | echo "export OTMEDIA_HOME=$PWD" |
main_tools/ExploitConfidencePass.sh
1 | #!/bin/bash | 1 | #!/bin/bash |
2 | 2 | ||
3 | ##################################################### | 3 | ##################################################### |
4 | # File : ExploitConfidencePass.sh # | 4 | # File : ExploitConfidencePass.sh # |
5 | # Brief : Exploit the ASR confidence pass to : # | 5 | # Brief : Exploit the ASR confidence pass to : # |
6 | # -> boost the confident zone # | 6 | # -> boost the confident zone # |
7 | # -> find alternative in non confident zone | 7 | # -> find alternative in non confident zone |
8 | # -> dynamicly extend the lexicon # | 8 | # -> dynamicly extend the lexicon # |
9 | # Author : Jean-François Rey # | 9 | # Author : Jean-François Rey # |
10 | # (base on Emmanuel Ferreira # | 10 | # (base on Emmanuel Ferreira # |
11 | # and Hugo Mauchrétien works) # | 11 | # and Hugo Mauchrétien works) # |
12 | # Version : 1.0 # | 12 | # Version : 1.0 # |
13 | # Date : 25/06/13 # | 13 | # Date : 25/06/13 # |
14 | ##################################################### | 14 | ##################################################### |
15 | 15 | ||
16 | echo "### ExploitConfidencePass.sh ###" | 16 | echo "### ExploitConfidencePass.sh ###" |
17 | 17 | ||
18 | # Check OTMEDIA_HOME env var | 18 | # Check OTMEDIA_HOME env var |
19 | if [ -z ${OTMEDIA_HOME} ] | 19 | if [ -z ${OTMEDIA_HOME} ] |
20 | then | 20 | then |
21 | OTMEDIA_HOME=$(dirname $(dirname $(readlink -e $0))) | 21 | OTMEDIA_HOME=$(dirname $(dirname $(readlink -e $0))) |
22 | export OTMEDIA_HOME=$OTMEDIA_HOME | 22 | export OTMEDIA_HOME=$OTMEDIA_HOME |
23 | fi | 23 | fi |
24 | 24 | ||
25 | # where is ExploitConfidencePass.sh | 25 | # where is ExploitConfidencePass.sh |
26 | MAIN_SCRIPT_PATH=$(dirname $(readlink -e $0)) | 26 | MAIN_SCRIPT_PATH=$(dirname $(readlink -e $0)) |
27 | 27 | ||
28 | if [ -z ${SCRIPT_PATH} ] | 28 | if [ -z ${SCRIPT_PATH} ] |
29 | then | 29 | then |
30 | SCRIPT_PATH=$OTMEDIA_HOME/tools/scripts | 30 | SCRIPT_PATH=$OTMEDIA_HOME/tools/scripts |
31 | fi | 31 | fi |
32 | 32 | ||
33 | # Include scripts | 33 | # Include scripts |
34 | . $SCRIPT_PATH"/Tools.sh" | 34 | . $SCRIPT_PATH"/Tools.sh" |
35 | . $SCRIPT_PATH"/CheckExploitConfPass.sh" | 35 | . $SCRIPT_PATH"/CheckExploitConfPass.sh" |
36 | 36 | ||
37 | # where is ExploitConfidencePass.cfg | 37 | # where is ExploitConfidencePass.cfg |
38 | EXPLOITCONFIDENCEPASS_CONFIG_FILE=$OTMEDIA_HOME"/cfg/ExploitConfidencePass.cfg" | 38 | EXPLOITCONFIDENCEPASS_CONFIG_FILE=$OTMEDIA_HOME"/cfg/ExploitConfidencePass.cfg" |
39 | if [ -e $EXPLOITCONFIDENCEPASS_CONFIG_FILE ] | 39 | if [ -e $EXPLOITCONFIDENCEPASS_CONFIG_FILE ] |
40 | then | 40 | then |
41 | . $EXPLOITCONFIDENCEPASS_CONFIG_FILE | 41 | . $EXPLOITCONFIDENCEPASS_CONFIG_FILE |
42 | else | 42 | else |
43 | echo "ERROR : Can't find configuration file $EXPLOITCONFIDENCEPASS_CONFIG_FILE" >&2 | 43 | echo "ERROR : Can't find configuration file $EXPLOITCONFIDENCEPASS_CONFIG_FILE" >&2 |
44 | exit 1 | 44 | exit 1 |
45 | fi | 45 | fi |
46 | 46 | ||
47 | #---------------# | 47 | #---------------# |
48 | # Parse Options # | 48 | # Parse Options # |
49 | #---------------# | 49 | #---------------# |
50 | while getopts ":hDv:cf:r" opt | 50 | while getopts ":hDv:cr" opt |
51 | do | 51 | do |
52 | case $opt in | 52 | case $opt in |
53 | h) | 53 | h) |
54 | echo -e "$0 [OPTIONS] <INPUT_DIRECTORY>\n" | 54 | echo -e "$0 [OPTIONS] <INPUT_DIRECTORY>\n" |
55 | echo -e "\t Options:" | 55 | echo -e "\t Options:" |
56 | echo -e "\t\t-h :\tprint this message" | 56 | echo -e "\t\t-h :\tprint this message" |
57 | echo -e "\t\t-D :\tDEBUG mode on" | 57 | echo -e "\t\t-D :\tDEBUG mode on" |
58 | echo -e "\t\t-v l :\tVerbose mode, l=(1|2|3) level mode" | 58 | echo -e "\t\t-v l :\tVerbose mode, l=(1|2|3) level mode" |
59 | echo -e "\t\t-c :\tCheck process, stop if error detected" | 59 | echo -e "\t\t-c :\tCheck process, stop if error detected" |
60 | echo -e "\t\t-f n :\tspecify a speeral forks number (default 1)" | ||
61 | echo -e "\t\t-r n :\tforce rerun without deleting files" | 60 | echo -e "\t\t-r n :\tforce rerun without deleting files" |
62 | exit 1 | 61 | exit 1 |
63 | ;; | 62 | ;; |
64 | D) | 63 | D) |
65 | DEBUG=1 | 64 | DEBUG=1 |
66 | ;; | 65 | ;; |
67 | v) | 66 | v) |
68 | VERBOSE=$OPTARG | 67 | VERBOSE=$OPTARG |
69 | ;; | 68 | ;; |
70 | c) | 69 | c) |
71 | CHECK=1 | 70 | CHECK=1 |
72 | ;; | ||
73 | f) | ||
74 | FORKS="--forks $OPTARG" | ||
75 | ;; | 71 | ;; |
76 | r) | 72 | r) |
77 | RERUN=1 | 73 | RERUN=1 |
78 | ;; | 74 | ;; |
79 | :) | 75 | :) |
80 | echo "Option -$OPTARG requires an argument." >&2 | 76 | echo "Option -$OPTARG requires an argument." >&2 |
81 | exit 1 | 77 | exit 1 |
82 | ;; | 78 | ;; |
83 | \?) | 79 | \?) |
84 | echo "BAD USAGE : unknow opton -$OPTARG" | 80 | echo "BAD USAGE : unknow opton -$OPTARG" |
85 | #exit 1 | 81 | #exit 1 |
86 | ;; | 82 | ;; |
87 | esac | 83 | esac |
88 | done | 84 | done |
89 | 85 | ||
90 | # mode debug enable | 86 | # mode debug enable |
91 | if [ $DEBUG -eq 1 ] | 87 | if [ $DEBUG -eq 1 ] |
92 | then | 88 | then |
93 | set -x | 89 | set -x |
94 | echo -e "## Mode DEBUG ON ##" | 90 | echo -e "## Mode DEBUG ON ##" |
95 | fi | 91 | fi |
96 | 92 | ||
97 | # mode verbose enable | 93 | # mode verbose enable |
98 | if [ $VERBOSE -gt 0 ]; then echo -e "## Verbose level : $VERBOSE ##" ;fi | 94 | if [ $VERBOSE -gt 0 ]; then echo -e "## Verbose level : $VERBOSE ##" ;fi |
99 | 95 | ||
100 | # Check USAGE by arguments number | 96 | # Check USAGE by arguments number |
101 | if [ $(($#-($OPTIND-1))) -ne 1 ] | 97 | if [ $(($#-($OPTIND-1))) -ne 1 ] |
102 | then | 98 | then |
103 | echo "BAD USAGE : ExploitConfidencePass.sh [OPTIONS] <INPUT_DIRECTORY>" | 99 | echo "BAD USAGE : ExploitConfidencePass.sh [OPTIONS] <INPUT_DIRECTORY>" |
104 | echo "$0 -h for more info" | 100 | echo "$0 -h for more info" |
105 | exit 1 | 101 | exit 1 |
106 | fi | 102 | fi |
107 | 103 | ||
108 | shift $((OPTIND-1)) | 104 | shift $((OPTIND-1)) |
109 | # check input directory - first argument | 105 | # check input directory - first argument |
110 | if [ ! -e $1 ] | 106 | if [ ! -e $1 ] |
111 | then | 107 | then |
112 | print_error "can't open $1" | 108 | print_error "can't open $1" |
113 | exit 1 | 109 | exit 1 |
114 | fi | 110 | fi |
115 | 111 | ||
116 | print_info "[${BASENAME}] => ExploitConfPass start | $(date +'%d/%m/%y %H:%M:%S')" 1 | 112 | print_info "[${BASENAME}] => ExploitConfPass start | $(date +'%d/%m/%y %H:%M:%S')" 1 |
117 | 113 | ||
118 | #-------------# | 114 | #-------------# |
119 | # GLOBAL VARS # | 115 | # GLOBAL VARS # |
120 | #-------------# | 116 | #-------------# |
121 | INPUT_DIR=$(readlink -e $1) | 117 | INPUT_DIR=$(readlink -e $1) |
122 | OUTPUT_DIR=$INPUT_DIR | 118 | OUTPUT_DIR=$INPUT_DIR |
123 | BASENAME=$(basename $OUTPUT_DIR) | 119 | BASENAME=$(basename $OUTPUT_DIR) |
124 | SHOW_DIR="$OUTPUT_DIR/shows/" | 120 | SHOW_DIR="$OUTPUT_DIR/shows/" |
125 | SOLR_RES="$OUTPUT_DIR/solr/" | 121 | SOLR_RES="$OUTPUT_DIR/solr/" |
126 | EXT_LEX="$OUTPUT_DIR/LEX/" | 122 | EXT_LEX="$OUTPUT_DIR/LEX/" |
127 | TRIGGER_CONFZONE="$OUTPUT_DIR/trigg/" | 123 | TRIGGER_CONFZONE="$OUTPUT_DIR/trigg/" |
128 | LOGFILE="$OUTPUT_DIR/info_exploitconf.log" | 124 | LOGFILE="$OUTPUT_DIR/info_exploitconf.log" |
129 | ERRORFILE="$OUTPUT_DIR/error_exploitconf.log" | 125 | ERRORFILE="$OUTPUT_DIR/error_exploitconf.log" |
130 | 126 | ||
131 | CONFPASS_CONFIG_FILE="$(readlink -e $1)/ConfPass.cfg" | 127 | CONFPASS_CONFIG_FILE="$(readlink -e $1)/ConfPass.cfg" |
132 | if [ -e $CONFPASS_CONFIG_FILE ] | 128 | if [ -e $CONFPASS_CONFIG_FILE ] |
133 | then | 129 | then |
134 | { | 130 | { |
135 | RES_CONF_DIR=$(cat $CONFPASS_CONFIG_FILE | grep "^RES_CONF_DIR=" | cut -f2 -d"=") | 131 | RES_CONF_DIR=$(cat $CONFPASS_CONFIG_FILE | grep "^RES_CONF_DIR=" | cut -f2 -d"=") |
136 | RES_CONF=$(cat $CONFPASS_CONFIG_FILE | grep "^CONF_DIR=" | cut -f2 -d"=") | 132 | RES_CONF=$(cat $CONFPASS_CONFIG_FILE | grep "^CONF_DIR=" | cut -f2 -d"=") |
137 | print_info "[${BASENAME}] Use confidence measure from : $RES_CONF" 2 | 133 | print_info "[${BASENAME}] Use confidence measure from : $RES_CONF" 2 |
138 | } | 134 | } |
139 | else | 135 | else |
140 | { | 136 | { |
141 | print_error "[${BASENAME}] Can't find $CONFPASS_CONFIG_FILE" | 137 | print_error "[${BASENAME}] Can't find $CONFPASS_CONFIG_FILE" |
142 | print_error "[${BASENAME}] -> use res_p2" | 138 | print_error "[${BASENAME}] -> use res_p2" |
143 | RES_CONF_DIR="$INPUT_DIR/conf/res_p2/scored_ctm" | 139 | RES_CONF_DIR="$INPUT_DIR/conf/res_p2/scored_ctm" |
144 | RES_CONF="$INPUT_DIR/conf/res_p2" | 140 | RES_CONF="$INPUT_DIR/conf/res_p2" |
145 | } | 141 | } |
146 | fi | 142 | fi |
147 | 143 | ||
148 | mkdir -p $SHOW_DIR > /dev/null 2>&1 | 144 | mkdir -p $SHOW_DIR > /dev/null 2>&1 |
149 | mkdir -p $SOLR_RES > /dev/null 2>&1 | 145 | mkdir -p $SOLR_RES > /dev/null 2>&1 |
150 | mkdir -p $EXT_LEX > /dev/null 2>&1 | 146 | mkdir -p $EXT_LEX > /dev/null 2>&1 |
151 | mkdir -p $TRIGGER_CONFZONE > /dev/null 2>&1 | 147 | mkdir -p $TRIGGER_CONFZONE > /dev/null 2>&1 |
152 | 148 | ||
153 | #------------------# | 149 | #------------------# |
154 | # Create Workspace # | 150 | # Create Workspace # |
155 | #------------------# | 151 | #------------------# |
156 | # Lock directory | 152 | # Lock directory |
157 | if [ -e "$OUTPUT_DIR_BASENAME/EXPLOITCONFPASS.lock" ] && [ $RERUN -eq 0 ] | 153 | if [ -e "$OUTPUT_DIR_BASENAME/EXPLOITCONFPASS.lock" ] && [ $RERUN -eq 0 ] |
158 | then | 154 | then |
159 | print_warn "[${BASENAME}] ExploitConfidencePass is locked -> exit" 2 | 155 | print_warn "[${BASENAME}] ExploitConfidencePass is locked -> exit" 2 |
160 | exit 1 | 156 | exit 1 |
161 | fi | 157 | fi |
162 | rm "$OUTPUT_DIR/EXPLOITCONFPASS.unlock" > /dev/null 2>&1 | 158 | rm "$OUTPUT_DIR/EXPLOITCONFPASS.unlock" > /dev/null 2>&1 |
163 | touch "$OUTPUT_DIR/EXPLOITCONFPASS.lock" > /dev/null 2>&1 | 159 | touch "$OUTPUT_DIR/EXPLOITCONFPASS.lock" > /dev/null 2>&1 |
164 | 160 | ||
165 | #------# | 161 | #------# |
166 | # Save # | 162 | # Save # |
167 | #------# | 163 | #------# |
168 | cp $EXPLOITCONFIDENCEPASS_CONFIG_FILE $OUTPUT_DIR/ExploitConfPass.cfg | 164 | cp $EXPLOITCONFIDENCEPASS_CONFIG_FILE $OUTPUT_DIR/ExploitConfPass.cfg |
169 | echo "TRIGGER_DIR=$TRIGGER_CONFZONE" >> $OUTPUT_DIR/ExploitConfPass.cfg | 165 | echo "TRIGGER_DIR=$TRIGGER_CONFZONE" >> $OUTPUT_DIR/ExploitConfPass.cfg |
170 | echo "TRIGGER_SPEERAL=$TRIGGER_CONFZONE/speeral/" >> $OUTPUT_DIR/ExploitConfPass.cfg | 166 | echo "TRIGGER_SPEERAL=$TRIGGER_CONFZONE/speeral/" >> $OUTPUT_DIR/ExploitConfPass.cfg |
171 | echo "LEX_SPEERAL=$EXT_LEX/speeral/${lexname}_ext" >> $OUTPUT_DIR/ExploitConfPass.cfg | 167 | echo "LEX_SPEERAL=$EXT_LEX/speeral/${lexname}_ext" >> $OUTPUT_DIR/ExploitConfPass.cfg |
172 | echo "LEX_BINODE_SPEERAL=$EXT_LEX/speeral/${lexname}_ext.bin" >> $OUTPUT_DIR/ExploitConfPass.cfg | 168 | echo "LEX_BINODE_SPEERAL=$EXT_LEX/speeral/${lexname}_ext.bin" >> $OUTPUT_DIR/ExploitConfPass.cfg |
173 | print_info "[${BASENAME}] Save config in $OUTPUT_DIR_BASENAME/ExploitConfPass.cfg" 1 | 169 | print_info "[${BASENAME}] Save config in $OUTPUT_DIR_BASENAME/ExploitConfPass.cfg" 1 |
174 | 170 | ||
175 | #---------------# | 171 | #---------------# |
176 | # Check Pass # | 172 | # Check Pass # |
177 | #---------------# | 173 | #---------------# |
178 | if [ $( ls ${RES_CONF_DIR}/*.res 2> /dev/null | wc -l) -eq 0 ] | 174 | if [ $( ls ${RES_CONF_DIR}/*.res 2> /dev/null | wc -l) -eq 0 ] |
179 | then | 175 | then |
180 | print_error "[${BASENAME}] No Conf Pass res -> exit ExploitConfPass" | 176 | print_error "[${BASENAME}] No Conf Pass res -> exit ExploitConfPass" |
181 | if [ $CHECK -eq 1 ]; then print_log_file $ERRORFILE "No ConfPass res in ${RES_CONF_DIR}" ;fi | 177 | if [ $CHECK -eq 1 ]; then print_log_file $ERRORFILE "No ConfPass res in ${RES_CONF_DIR}" ;fi |
182 | exit 1 | 178 | exit 1 |
183 | fi | 179 | fi |
184 | 180 | ||
185 | #-----------------------# | 181 | #-----------------------# |
186 | # Segmentation by show # | 182 | # Segmentation by show # |
187 | #-----------------------# | 183 | #-----------------------# |
188 | # create txt file from scored res | 184 | # create txt file from scored res |
189 | # tag pos and lemmatization of the txt file | 185 | # tag pos and lemmatization of the txt file |
190 | # merge the scored res and taglem file | 186 | # merge the scored res and taglem file |
191 | # segment using the last generated file | 187 | # segment using the last generated file |
192 | # and create a ctm file by show | 188 | # and create a ctm file by show |
193 | 189 | ||
194 | print_info "[${BASENAME}] Segmentation by show" 1 | 190 | print_info "[${BASENAME}] Segmentation by show" 1 |
195 | 191 | ||
196 | # -> to txt | 192 | # -> to txt |
197 | print_info "[${BASENAME}] Create txt from scored res" 3 | 193 | print_info "[${BASENAME}] Create txt from scored res" 3 |
198 | cat ${RES_CONF_DIR}/*.res > $INPUT_DIR/$BASENAME.sctm | 194 | cat ${RES_CONF_DIR}/*.res > $INPUT_DIR/$BASENAME.sctm |
199 | cat $INPUT_DIR/$BASENAME.seg | $SIGMUND_BIN/myConvert.pl $INPUT_DIR/$BASENAME.sctm $INPUT_DIR/$BASENAME.tmp | 195 | cat $INPUT_DIR/$BASENAME.seg | $SIGMUND_BIN/myConvert.pl $INPUT_DIR/$BASENAME.sctm $INPUT_DIR/$BASENAME.tmp |
200 | cat $INPUT_DIR/$BASENAME.tmp | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -f | sed -e "s/_/ /g" | sort -nt 'n' -k '2' > $INPUT_DIR/$BASENAME.txt | 196 | cat $INPUT_DIR/$BASENAME.tmp | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -f | sed -e "s/_/ /g" | sort -nt 'n' -k '2' > $INPUT_DIR/$BASENAME.txt |
201 | 197 | ||
202 | # -> to tagger + lemme | 198 | # -> to tagger + lemme |
203 | print_info "[${BASENAME}] Tag pos and lem in txt file" 3 | 199 | print_info "[${BASENAME}] Tag pos and lem in txt file" 3 |
204 | iconv -t ISO_8859-1 $INPUT_DIR/$BASENAME.txt > $INPUT_DIR/$BASENAME.tmp | 200 | iconv -t ISO_8859-1 $INPUT_DIR/$BASENAME.txt > $INPUT_DIR/$BASENAME.tmp |
205 | $SIGMUND_BIN/txt2lem.sh $INPUT_DIR/$BASENAME.tmp $INPUT_DIR/$BASENAME.taglem | 201 | $SIGMUND_BIN/txt2lem.sh $INPUT_DIR/$BASENAME.tmp $INPUT_DIR/$BASENAME.taglem |
206 | 202 | ||
207 | # merge sctm and taglem | 203 | # merge sctm and taglem |
208 | print_info "[${BASENAME}] Merge scored ctm with tag pos and lem file" 3 | 204 | print_info "[${BASENAME}] Merge scored ctm with tag pos and lem file" 3 |
209 | cat $INPUT_DIR/$BASENAME.sctm | $SCRIPT_PATH/BdlexUC.pl ${RULES}/basic -f | iconv -t ISO_8859-1 | $SCRIPT_PATH/scoredCtmAndTaggedLem2All.pl $INPUT_DIR/$BASENAME.taglem > $INPUT_DIR/$BASENAME.ctl | 205 | cat $INPUT_DIR/$BASENAME.sctm | $SCRIPT_PATH/BdlexUC.pl ${RULES}/basic -f | iconv -t ISO_8859-1 | $SCRIPT_PATH/scoredCtmAndTaggedLem2All.pl $INPUT_DIR/$BASENAME.taglem > $INPUT_DIR/$BASENAME.ctl |
210 | 206 | ||
211 | # -> new seg | 207 | # -> new seg |
212 | print_info "[${BASENAME}] Create xml file and run Topic Seg" 3 | 208 | print_info "[${BASENAME}] Create xml file and run Topic Seg" 3 |
213 | $SIGMUND_BIN/tagLem2xml.pl $INPUT_DIR/$BASENAME.taglem $INPUT_DIR/$BASENAME.doc.xml | 209 | $SIGMUND_BIN/tagLem2xml.pl $INPUT_DIR/$BASENAME.taglem $INPUT_DIR/$BASENAME.doc.xml |
214 | rm $INPUT_DIR/$BASENAME.tmp #$INPUT_DIR/$BASENAME.taglem | 210 | rm $INPUT_DIR/$BASENAME.tmp #$INPUT_DIR/$BASENAME.taglem |
215 | 211 | ||
216 | # Lia_topic_seg : bring together sentences into show | 212 | # Lia_topic_seg : bring together sentences into show |
217 | cp $INPUT_DIR/$BASENAME.doc.xml 0.xml | 213 | cp $INPUT_DIR/$BASENAME.doc.xml 0.xml |
218 | java -cp $LIATOPICSEG/bin Test > $INPUT_DIR/show.seg | 214 | java -cp $LIATOPICSEG/bin Test > $INPUT_DIR/show.seg |
219 | cat $INPUT_DIR/show.seg | $SIGMUND_BIN/toSegEmiss.pl $INPUT_DIR/$BASENAME.show.seg | 215 | cat $INPUT_DIR/show.seg | $SIGMUND_BIN/toSegEmiss.pl $INPUT_DIR/$BASENAME.show.seg |
220 | rm 0.xml $INPUT_DIR/show.seg | 216 | rm 0.xml $INPUT_DIR/show.seg |
221 | 217 | ||
222 | if [ $CHECK -eq 1 ] | 218 | if [ $CHECK -eq 1 ] |
223 | then | 219 | then |
224 | if [ ! -s $INPUT_DIR/$BASENAME.show.seg ] | 220 | if [ ! -s $INPUT_DIR/$BASENAME.show.seg ] |
225 | then | 221 | then |
226 | print_error "[${BASENAME}] No Topic segmentation ! " | 222 | print_error "[${BASENAME}] No Topic segmentation ! " |
227 | print_error "[${BASENAME}] Check $ERRORFILE " | 223 | print_error "[${BASENAME}] Check $ERRORFILE " |
228 | print_log_file "$ERRORFILE" "No Topic segmentation in ${BASENAME}.show.seg" | 224 | print_log_file "$ERRORFILE" "No Topic segmentation in ${BASENAME}.show.seg" |
229 | fi | 225 | fi |
230 | fi | 226 | fi |
231 | 227 | ||
232 | # Segment ctm into several show files and create a seg list by show | 228 | # Segment ctm into several show files and create a seg list by show |
233 | print_info "[${BASENAME}] Segment ctm into show files and a seg list by show" 1 | 229 | print_info "[${BASENAME}] Segment ctm into show files and a seg list by show" 1 |
234 | $SCRIPT_PATH/ctm2show.pl $INPUT_DIR/$BASENAME.ctl $INPUT_DIR/$BASENAME.show.seg $SHOW_DIR | 230 | $SCRIPT_PATH/ctm2show.pl $INPUT_DIR/$BASENAME.ctl $INPUT_DIR/$BASENAME.show.seg $SHOW_DIR |
235 | 231 | ||
236 | #-----------------------------------------------------------# | 232 | #-----------------------------------------------------------# |
237 | # SOLR QUERIES # | 233 | # SOLR QUERIES # |
238 | # -> Create Confidente Word # | 234 | # -> Create Confidente Word # |
239 | # Keep conf words and use Tags # | 235 | # Keep conf words and use Tags # |
240 | # -> Query SOLR (document & multimedia) # | 236 | # -> Query SOLR (document & multimedia) # |
241 | # concat word + add date 2 day before and after the show # | 237 | # concat word + add date 2 day before and after the show # |
242 | # query document & multimedia # | 238 | # query document & multimedia # |
243 | #-----------------------------------------------------------# | 239 | #-----------------------------------------------------------# |
244 | print_info "[${BASENAME}] Create SOLR queries and ask SOLR" 1 | 240 | print_info "[${BASENAME}] Create SOLR queries and ask SOLR" 1 |
245 | for show in $(ls $SHOW_DIR/*.ctm) | 241 | for show in $(ls $SHOW_DIR/*.ctm) |
246 | do | 242 | do |
247 | bn=$(basename $show .ctm) | 243 | bn=$(basename $show .ctm) |
248 | # Remove words with low confidence and keep useful tagger words | 244 | # Remove words with low confidence and keep useful tagger words |
249 | cat $show | $SCRIPT_PATH/KeepConfZone.pl | grep -e "MOTINC\|NMS\|NMP\|NFS\|NFP\|X[A-Z]{3,5}" | cut -f3 -d' ' > "$SHOW_DIR/$bn.confzone" | 245 | cat $show | $SCRIPT_PATH/KeepConfZone.pl | grep -e "MOTINC\|NMS\|NMP\|NFS\|NFP\|X[A-Z]{3,5}" | cut -f3 -d' ' > "$SHOW_DIR/$bn.confzone" |
250 | # Get date 2 day before and after the show | 246 | # Get date 2 day before and after the show |
251 | datePattern=`$SCRIPT_PATH/daybefore2after.sh $(echo $BASENAME | cut -c1-6)` | 247 | datePattern=`$SCRIPT_PATH/daybefore2after.sh $(echo $BASENAME | cut -c1-6)` |
252 | # Create SOLR queries | 248 | # Create SOLR queries |
253 | cat $SHOW_DIR/$bn".confzone" | $SCRIPT_PATH/GenerateSOLRQueries.pl | iconv -f ISO_8859-1 -t UTF-8 > "$SHOW_DIR/$bn.queries" | 249 | cat $SHOW_DIR/$bn".confzone" | $SCRIPT_PATH/GenerateSOLRQueries.pl | iconv -f ISO_8859-1 -t UTF-8 > "$SHOW_DIR/$bn.queries" |
254 | # Ask SOLR DB | 250 | # Ask SOLR DB |
255 | if [ $(wc -w "$SHOW_DIR/$bn.queries" | cut -f1 -d' ') -gt 0 ]; then | 251 | if [ $(wc -w "$SHOW_DIR/$bn.queries" | cut -f1 -d' ') -gt 0 ]; then |
256 | query=$(cat $SHOW_DIR/$bn.queries)"&fq=docDate:[$datePattern]" | 252 | query=$(cat $SHOW_DIR/$bn.queries)"&fq=docDate:[$datePattern]" |
257 | echo $query > $SHOW_DIR/$bn.queries | 253 | echo $query > $SHOW_DIR/$bn.queries |
258 | print_info "python $SCRIPT_PATH/ProcessSOLRQueries.py $SHOW_DIR/$bn.queries $SOLR_RES/$bn.keywords.tmp $SOLR_RES/$bn.txt.tmp" 3 | 254 | print_info "python $SCRIPT_PATH/ProcessSOLRQueries.py $SHOW_DIR/$bn.queries $SOLR_RES/$bn.keywords.tmp $SOLR_RES/$bn.txt.tmp" 3 |
259 | python $SCRIPT_PATH/ProcessSOLRQueries.py $SHOW_DIR/$bn.queries $SOLR_RES/$bn.keywords.tmp $SOLR_RES/$bn.txt.tmp | 255 | python $SCRIPT_PATH/ProcessSOLRQueries.py $SHOW_DIR/$bn.queries $SOLR_RES/$bn.keywords.tmp $SOLR_RES/$bn.txt.tmp |
260 | cat $SOLR_RES/$bn.keywords.tmp | sort -u > $SOLR_RES/$bn.keywords | 256 | cat $SOLR_RES/$bn.keywords.tmp | sort -u > $SOLR_RES/$bn.keywords |
261 | cat $SOLR_RES/$bn.txt.tmp | sort -u > $SOLR_RES/$bn.txt | 257 | cat $SOLR_RES/$bn.txt.tmp | sort -u > $SOLR_RES/$bn.txt |
262 | rm $SOLR_RES/*.tmp > /dev/null 2>&1 | 258 | rm $SOLR_RES/*.tmp > /dev/null 2>&1 |
263 | fi | 259 | fi |
264 | 260 | ||
265 | if [ $CHECK -eq 1 ] | 261 | if [ $CHECK -eq 1 ] |
266 | then | 262 | then |
267 | if [ ! -e $SOLR_RES/$bn.keywords ] || [ ! -e $SOLR_RES/$bn.txt ] | 263 | if [ ! -e $SOLR_RES/$bn.keywords ] || [ ! -e $SOLR_RES/$bn.txt ] |
268 | then | 264 | then |
269 | print_warn "$bn.keywords and $bn.txt are empty !\nMaybe SOLR server is down !" 2 | 265 | print_warn "$bn.keywords and $bn.txt are empty !\nMaybe SOLR server is down !" 2 |
270 | print_log_file "$LOGFILE" "$bn.keywords and $bn.txt are empty !\nMaybe SOLR server is down !" | 266 | print_log_file "$LOGFILE" "$bn.keywords and $bn.txt are empty !\nMaybe SOLR server is down !" |
271 | fi | 267 | fi |
272 | fi | 268 | fi |
273 | 269 | ||
274 | done | 270 | done |
275 | 271 | ||
276 | #----------------------------------------------------------------------------------------------- | 272 | #----------------------------------------------------------------------------------------------- |
277 | # Build trigger file | 273 | # Build trigger file |
278 | # 1) keywords are automatically boosted in the non confident zone of the current res | 274 | # 1) keywords are automatically boosted in the non confident zone of the current res |
279 | # confident zone are boosted | 275 | # confident zone are boosted |
280 | # previous words in sensible zone are penalized | 276 | # previous words in sensible zone are penalized |
281 | # 2) OOVs are extracted + phonetized | 277 | # 2) OOVs are extracted + phonetized |
282 | # 3) Try to find OOVs acousticly in the current segment | 278 | # 3) Try to find OOVs acousticly in the current segment |
283 | # 4) Generate the .trigg file | 279 | # 4) Generate the .trigg file |
284 | #------------------------------------------------------------------------------------------------ | 280 | #------------------------------------------------------------------------------------------------ |
285 | print_info "[${BASENAME}] Build trigger files" 1 | 281 | print_info "[${BASENAME}] Build trigger files" 1 |
286 | for i in `ls $SOLR_RES/*.keywords` | 282 | for i in `ls $SOLR_RES/*.keywords` |
287 | do | 283 | do |
288 | basename=`basename $i .keywords` | 284 | basename=`basename $i .keywords` |
289 | 285 | ||
290 | # | 286 | # |
291 | # Tokenize & produce coverage report | 287 | # Tokenize & produce coverage report |
292 | # Use filter you need | 288 | # Use filter you need |
293 | # | 289 | # |
294 | print_info "[${BASENAME}] keywords filtering and produce coverage report" 3 | 290 | print_info "[${BASENAME}] keywords filtering and produce coverage report" 3 |
295 | # Default filter | 291 | # Default filter |
296 | cat $i | $SCRIPT_PATH/CleanFilter.sh | ${SCRIPT_PATH}/ApplyCorrectionRules.pl ${LEXICON}.regex | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t |\ | 292 | cat $i | $SCRIPT_PATH/CleanFilter.sh | ${SCRIPT_PATH}/ApplyCorrectionRules.pl ${LEXICON}.regex | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t |\ |
297 | $SCRIPT_PATH/CoverageReportMaker.pl --out $SOLR_RES/${basename}_tmp_report $LEXICON.bdlex_tok | 293 | $SCRIPT_PATH/CoverageReportMaker.pl --out $SOLR_RES/${basename}_tmp_report $LEXICON.bdlex_tok |
298 | # do less filter | 294 | # do less filter |
299 | #cat $i | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t | sed -f $RULES/preprocess.regex | sed -f $RULES/lastprocess.regex | $SCRIPT_PATH/CoverageReportMaker.pl --out $SOLR_RES/${basename}_tmp_report $LEXICON.bdlex_tok | 295 | #cat $i | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t | sed -f $RULES/preprocess.regex | sed -f $RULES/lastprocess.regex | $SCRIPT_PATH/CoverageReportMaker.pl --out $SOLR_RES/${basename}_tmp_report $LEXICON.bdlex_tok |
300 | 296 | ||
301 | 297 | ||
302 | # | 298 | # |
303 | # Extract "real" OOV and phonetize them | 299 | # Extract "real" OOV and phonetize them |
304 | # -> petit filtrage persoo pour eviter d'avoir trop de bruits | 300 | # -> petit filtrage persoo pour eviter d'avoir trop de bruits |
305 | # | 301 | # |
306 | print_info "[${BASENAME}] Extract OOV and phonetize them" 3 | 302 | print_info "[${BASENAME}] Extract OOV and phonetize them" 3 |
307 | ${SCRIPT_PATH}/FindNormRules.pl $SOLR_RES/${basename}_tmp_report/report.oov $LEXICON.bdlex_tok | cut -f3 | grep -v "#" | grep -v "^[A-Z]\+$" | grep -v "^[0-9]" | grep --perl-regex -v "^([a-z']){1,3}$" | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -f | iconv -t ISO_8859-1 -f UTF-8 | ${LIA_LTBOX}/lia_phon/script/lia_lex2phon_variante | grep -v "core dumped" | cut -d"[" -f1 | sort -u | ${SCRIPT_PATH}/PhonFormatter.pl | iconv -f ISO_8859-1 -t UTF-8 | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t > $SOLR_RES/${basename}.phon_oov | 303 | ${SCRIPT_PATH}/FindNormRules.pl $SOLR_RES/${basename}_tmp_report/report.oov $LEXICON.bdlex_tok | cut -f3 | grep -v "#" | grep -v "^[A-Z]\+$" | grep -v "^[0-9]" | grep --perl-regex -v "^([a-z']){1,3}$" | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -f | iconv -t ISO_8859-1 -f UTF-8 | ${LIA_LTBOX}/lia_phon/script/lia_lex2phon_variante | grep -v "core dumped" | cut -d"[" -f1 | sort -u | ${SCRIPT_PATH}/PhonFormatter.pl | iconv -f ISO_8859-1 -t UTF-8 | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t > $SOLR_RES/${basename}.phon_oov |
308 | 304 | ||
309 | # | 305 | # |
310 | # Search INVOC & OOV in the current lattice | 306 | # Search INVOC & OOV in the current lattice |
311 | # | 307 | # |
312 | print_info "[${BASENAME}] Search INVOC and OOV in the current lattice" 3 | 308 | print_info "[${BASENAME}] Search INVOC and OOV in the current lattice" 3 |
313 | cat $SOLR_RES/${basename}_tmp_report/report.invoc | grep -v "\b0" | cut -f1 | grep -v --perl-regex -v "^[a-zA-Z']{1,3}$" | grep -v --perl-regex "^[a-zA-Z0-9]{1,3}$" | grep -v "<s>" | grep -v "</s>" | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t > $TRIGGER_CONFZONE/$basename.tosearch | 309 | cat $SOLR_RES/${basename}_tmp_report/report.invoc | grep -v "\b0" | cut -f1 | grep -v --perl-regex -v "^[a-zA-Z']{1,3}$" | grep -v --perl-regex "^[a-zA-Z0-9]{1,3}$" | grep -v "<s>" | grep -v "</s>" | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t > $TRIGGER_CONFZONE/$basename.tosearch |
314 | cat $SOLR_RES/${basename}.phon_oov | cut -f1 >> $TRIGGER_CONFZONE/$basename.tosearch | 310 | cat $SOLR_RES/${basename}.phon_oov | cut -f1 >> $TRIGGER_CONFZONE/$basename.tosearch |
315 | 311 | ||
316 | # For each treil | 312 | # For each treil |
317 | for baseseg in $(cat "$SHOW_DIR/$basename.lst") | 313 | for baseseg in $(cat "$SHOW_DIR/$basename.lst") |
318 | do | 314 | do |
319 | $OTMEDIA_HOME/tools/QUOTE_FINDER/bin/acousticFinder ${LEXICON}.speer_phon $RES_CONF/wlat/$baseseg.wlat $TRIGGER_CONFZONE/${basename}.tosearch $SOLR_RES/$basename.phon_oov > $TRIGGER_CONFZONE/$baseseg.acousticlyfound $OUTPUT_REDIRECTION | 315 | $OTMEDIA_HOME/tools/QUOTE_FINDER/bin/acousticFinder ${LEXICON}.speer_phon $RES_CONF/wlat/$baseseg.wlat $TRIGGER_CONFZONE/${basename}.tosearch $SOLR_RES/$basename.phon_oov > $TRIGGER_CONFZONE/$baseseg.acousticlyfound $OUTPUT_REDIRECTION |
320 | # | 316 | # |
321 | # Produce the boost file for the next decoding pass | 317 | # Produce the boost file for the next decoding pass |
322 | # | 318 | # |
323 | print_info "[${BASENAME}] Produce trigg file : $baseseg " 3 | 319 | print_info "[${BASENAME}] Produce trigg file : $baseseg " 3 |
324 | cat $RES_CONF_DIR/$baseseg.res | $SCRIPT_PATH/ScoreCtm2trigg.pl $TRIGGER_CONFZONE/$baseseg.acousticlyfound > $TRIGGER_CONFZONE/$baseseg.trigg | 320 | cat $RES_CONF_DIR/$baseseg.res | $SCRIPT_PATH/ScoreCtm2trigg.pl $TRIGGER_CONFZONE/$baseseg.acousticlyfound > $TRIGGER_CONFZONE/$baseseg.trigg |
325 | done | 321 | done |
326 | 322 | ||
327 | done | 323 | done |
328 | 324 | ||
329 | #----------------------------------------------------------------------------------------------- | 325 | #----------------------------------------------------------------------------------------------- |
330 | # Build the extended SPEERAL Lexicon | 326 | # Build the extended SPEERAL Lexicon |
331 | # 1) Merge OOVs + LEXICON | 327 | # 1) Merge OOVs + LEXICON |
332 | # 1) Related text are collected in order to find the invoc word with maximizing the ppl (LM proba) | 328 | # 1) Related text are collected in order to find the invoc word with maximizing the ppl (LM proba) |
333 | # 2) The current lexicon is extended with all the valid OOVs | 329 | # 2) The current lexicon is extended with all the valid OOVs |
334 | #----------------------------------------------------------------------------------------------- | 330 | #----------------------------------------------------------------------------------------------- |
335 | print_info "[${BASENAME}] Build extended Speeral Lexicon" 1 | 331 | print_info "[${BASENAME}] Build extended Speeral Lexicon" 1 |
336 | mkdir -p $EXT_LEX/final | 332 | mkdir -p $EXT_LEX/final |
337 | mkdir -p $EXT_LEX/tmp | 333 | mkdir -p $EXT_LEX/tmp |
338 | mkdir -p $EXT_LEX/tmp/txt | 334 | mkdir -p $EXT_LEX/tmp/txt |
339 | # | 335 | # |
340 | # Collect the acousticly found oov and their phonetisation | 336 | # Collect the acousticly found oov and their phonetisation |
341 | # | 337 | # |
342 | print_info "[${BASENAME}] Get all OOV and retrieve all phonetisation" 3 | 338 | print_info "[${BASENAME}] Get all OOV and retrieve all phonetisation" 3 |
343 | for i in `ls $SOLR_RES/*.phon_oov` | 339 | for i in `ls $SOLR_RES/*.phon_oov` |
344 | do | 340 | do |
345 | basename=`basename $i .phon_oov` | 341 | basename=`basename $i .phon_oov` |
346 | 342 | ||
347 | rm $EXT_LEX/$basename.acousticlyfound 2> /dev/null | 343 | rm $EXT_LEX/$basename.acousticlyfound 2> /dev/null |
348 | # list acousticly found for the show | 344 | # list acousticly found for the show |
349 | for baseseg in $(cat "$SHOW_DIR/$basename.lst") | 345 | for baseseg in $(cat "$SHOW_DIR/$basename.lst") |
350 | do | 346 | do |
351 | cat $TRIGGER_CONFZONE/$baseseg.acousticlyfound | cut -f1 | cut -f2 -d"=" >> $EXT_LEX/$basename.acousticlyfound | 347 | cat $TRIGGER_CONFZONE/$baseseg.acousticlyfound | cut -f1 | cut -f2 -d"=" >> $EXT_LEX/$basename.acousticlyfound |
352 | done | 348 | done |
353 | cat $EXT_LEX/$basename.acousticlyfound | sort -u > $EXT_LEX/.tmp | 349 | cat $EXT_LEX/$basename.acousticlyfound | sort -u > $EXT_LEX/.tmp |
354 | mv $EXT_LEX/.tmp $EXT_LEX/$basename.acousticlyfound | 350 | mv $EXT_LEX/.tmp $EXT_LEX/$basename.acousticlyfound |
355 | 351 | ||
356 | # | 352 | # |
357 | # Extract OOV really added | 353 | # Extract OOV really added |
358 | # | 354 | # |
359 | cat $SOLR_RES/$basename.phon_oov | cut -f1 | sort -u > $EXT_LEX/$basename.oov | 355 | cat $SOLR_RES/$basename.phon_oov | cut -f1 | sort -u > $EXT_LEX/$basename.oov |
360 | $SCRIPT_PATH/intersec.pl $EXT_LEX/$basename.oov $EXT_LEX/$basename.acousticlyfound > $EXT_LEX/$basename.oov_acousticlyfound | 356 | $SCRIPT_PATH/intersec.pl $EXT_LEX/$basename.oov $EXT_LEX/$basename.acousticlyfound > $EXT_LEX/$basename.oov_acousticlyfound |
361 | # | 357 | # |
362 | # Retrieve all phonetisation | 358 | # Retrieve all phonetisation |
363 | # | 359 | # |
364 | cat $SOLR_RES/${basename}.phon_oov | $SCRIPT_PATH/LexPhonFilter.pl $EXT_LEX/$basename.oov_acousticlyfound > $EXT_LEX/$basename.oov_acousticlyfound_phon | 360 | cat $SOLR_RES/${basename}.phon_oov | $SCRIPT_PATH/LexPhonFilter.pl $EXT_LEX/$basename.oov_acousticlyfound > $EXT_LEX/$basename.oov_acousticlyfound_phon |
365 | done | 361 | done |
366 | 362 | ||
367 | # | 363 | # |
368 | # Merge OOVs and their phonetisation | 364 | # Merge OOVs and their phonetisation |
369 | # | 365 | # |
370 | print_info "[${BASENAME}] Merge OOV and their phonetisation" 3 | 366 | print_info "[${BASENAME}] Merge OOV and their phonetisation" 3 |
371 | lexname=$(basename $LEXICON) | 367 | lexname=$(basename $LEXICON) |
372 | cat $EXT_LEX/*.oov_acousticlyfound_phon | sort -u > $EXT_LEX/final/all.oov_acousticlyfound_phon | 368 | cat $EXT_LEX/*.oov_acousticlyfound_phon | sort -u > $EXT_LEX/final/all.oov_acousticlyfound_phon |
373 | cat $EXT_LEX/*.oov_acousticlyfound | sort -u | grep --perl-regex -v "^([a-z']){3}$" > $EXT_LEX/final/all.oov_acousticlyfound | 369 | cat $EXT_LEX/*.oov_acousticlyfound | sort -u | grep --perl-regex -v "^([a-z']){3}$" > $EXT_LEX/final/all.oov_acousticlyfound |
374 | $SCRIPT_PATH/MergeLexicon.pl $EXT_LEX/final/all.oov_acousticlyfound_phon > $EXT_LEX/final/${lexname}_ext.phon | 370 | $SCRIPT_PATH/MergeLexicon.pl $EXT_LEX/final/all.oov_acousticlyfound_phon > $EXT_LEX/final/${lexname}_ext.phon |
375 | 371 | ||
376 | # | 372 | # |
377 | # Collect + clean retrieved txt | 373 | # Collect + clean retrieved txt |
378 | # | 374 | # |
379 | print_info "[${BASENAME}] Collect and clean SOLR txt answers" 2 | 375 | print_info "[${BASENAME}] Collect and clean SOLR txt answers" 2 |
380 | # choose filter | 376 | # choose filter |
381 | # default | 377 | # default |
382 | cat $SOLR_RES/*.txt | $SCRIPT_PATH/CleanFilter.sh | $SCRIPT_PATH/ApplyCorrectionRules.pl ${LEXICON}.regex | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t > $EXT_LEX/final/all.bdlex_txt | 378 | cat $SOLR_RES/*.txt | $SCRIPT_PATH/CleanFilter.sh | $SCRIPT_PATH/ApplyCorrectionRules.pl ${LEXICON}.regex | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t > $EXT_LEX/final/all.bdlex_txt |
383 | # low filter | 379 | # low filter |
384 | #cat $SOLR_RES/*.txt | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t | sed -f $RULES/preprocess.regex | sed -f $RULES/lastprocess.regex > $EXT_LEX/final/all.bdlex_txt | 380 | #cat $SOLR_RES/*.txt | $SCRIPT_PATH/BdlexUC.pl $RULES/basic -t | sed -f $RULES/preprocess.regex | sed -f $RULES/lastprocess.regex > $EXT_LEX/final/all.bdlex_txt |
385 | 381 | ||
386 | # | 382 | # |
387 | # Construct the map file | 383 | # Construct the map file |
388 | # | 384 | # |
389 | # Notes: | 385 | # Notes: |
390 | # - Expected format : | 386 | # - Expected format : |
391 | # <WORD1_STRING> <CANDIDATE1_STRING> <PHON_1> | 387 | # <WORD1_STRING> <CANDIDATE1_STRING> <PHON_1> |
392 | # | 388 | # |
393 | print_info "[${BASENAME}] Construct map file" 3 | 389 | print_info "[${BASENAME}] Construct map file" 3 |
394 | rm -f $EXT_LEX/final/${lexname}_ext.map 2>/dev/null | 390 | rm -f $EXT_LEX/final/${lexname}_ext.map 2>/dev/null |
395 | rm -f $EXT_LEX/final/${lexname}.unvalid_oov 2>/dev/null | 391 | rm -f $EXT_LEX/final/${lexname}.unvalid_oov 2>/dev/null |
396 | 392 | ||
397 | while read oov | 393 | while read oov |
398 | do | 394 | do |
399 | oov=`echo $oov | sed "s/\n//g"` | 395 | oov=`echo $oov | sed "s/\n//g"` |
400 | # | 396 | # |
401 | # Obtain the oov's tag | 397 | # Obtain the oov's tag |
402 | # | 398 | # |
403 | #oov_tag=`grep --perl-regex "^$oov\t" $DYNAMIC_TAGSTATS/all.tags | cut -f2` | 399 | #oov_tag=`grep --perl-regex "^$oov\t" $DYNAMIC_TAGSTATS/all.tags | cut -f2` |
404 | # | 400 | # |
405 | # Try to collect text containing the oov word | 401 | # Try to collect text containing the oov word |
406 | # | 402 | # |
407 | print_info "[${BASENAME}] Collect text containing the oov" 3 | 403 | print_info "[${BASENAME}] Collect text containing the oov" 3 |
408 | cat $EXT_LEX/final/all.bdlex_txt | grep --perl-regex " $oov " | $SCRIPT_PATH/NbMaxWordsFilter.pl 40 |uniq > $EXT_LEX/tmp/txt/$oov.bdlex_txt | 404 | cat $EXT_LEX/final/all.bdlex_txt | grep --perl-regex " $oov " | $SCRIPT_PATH/NbMaxWordsFilter.pl 40 |uniq > $EXT_LEX/tmp/txt/$oov.bdlex_txt |
409 | if [ -f $EXT_LEX/tmp/txt/$oov.bdlex_txt ]; then | 405 | if [ -f $EXT_LEX/tmp/txt/$oov.bdlex_txt ]; then |
410 | nbWords=`wc -l $EXT_LEX/tmp/txt/$oov.bdlex_txt | cut -f1 -d" "` | 406 | nbWords=`wc -l $EXT_LEX/tmp/txt/$oov.bdlex_txt | cut -f1 -d" "` |
411 | if [ $nbWords -eq 0 ]; then | 407 | if [ $nbWords -eq 0 ]; then |
412 | print_warn "[${BASENAME}] UNVALID OOV: $oov => $nbWords occurrences" 2 | 408 | print_warn "[${BASENAME}] UNVALID OOV: $oov => $nbWords occurrences" 2 |
413 | echo "$oov" >> $EXT_LEX/final/${lexname}.unvalid_oov | 409 | echo "$oov" >> $EXT_LEX/final/${lexname}.unvalid_oov |
414 | else | 410 | else |
415 | # | 411 | # |
416 | # Find a candidate in a filtred invoc lexicon => a candidate which maximize the ppl in the overall txt collected | 412 | # Find a candidate in a filtred invoc lexicon => a candidate which maximize the ppl in the overall txt collected |
417 | # | 413 | # |
418 | #echo "$/getCandidate $SPEER_LM_PATH $SPEER_LM_BASENAME $oov $LEXICON.bdlex_tok $EXT_LEX/tmp/txt/$oov.bdlex_txt" | 414 | #echo "$/getCandidate $SPEER_LM_PATH $SPEER_LM_BASENAME $oov $LEXICON.bdlex_tok $EXT_LEX/tmp/txt/$oov.bdlex_txt" |
419 | print_info `$SPEERAL_PATH/bin/getCandidate $SPEER_LM_PATH $SPEER_LM_BASENAME $oov $CANDIDATE_LEXICON $EXT_LEX/tmp/txt/$oov.bdlex_txt | cut -f1 -d" "` 3 | 415 | print_info `$SPEERAL_PATH/bin/getCandidate $SPEER_LM_PATH $SPEER_LM_BASENAME $oov $CANDIDATE_LEXICON $EXT_LEX/tmp/txt/$oov.bdlex_txt | cut -f1 -d" "` 3 |
420 | candidate=`$SPEERAL_PATH/bin/getCandidate $SPEER_LM_PATH $SPEER_LM_BASENAME $oov $CANDIDATE_LEXICON $EXT_LEX/tmp/txt/$oov.bdlex_txt | cut -f1 -d" "` | 416 | candidate=`$SPEERAL_PATH/bin/getCandidate $SPEER_LM_PATH $SPEER_LM_BASENAME $oov $CANDIDATE_LEXICON $EXT_LEX/tmp/txt/$oov.bdlex_txt | cut -f1 -d" "` |
421 | if [ ! "$candidate" == "" ]; then | 417 | if [ ! "$candidate" == "" ]; then |
422 | grep --perl-regex "^$oov\t" $EXT_LEX/final/all.oov_acousticlyfound_phon > $EXT_LEX/tmp/$oov.phon | 418 | grep --perl-regex "^$oov\t" $EXT_LEX/final/all.oov_acousticlyfound_phon > $EXT_LEX/tmp/$oov.phon |
423 | while read phonLine | 419 | while read phonLine |
424 | do | 420 | do |
425 | #<word> <phon> => <word> <candidate> <phon> | 421 | #<word> <phon> => <word> <candidate> <phon> |
426 | echo "$phonLine" | sed "s|\t|\t$candidate\t|" >> $EXT_LEX/final/${lexname}_ext.map | 422 | echo "$phonLine" | sed "s|\t|\t$candidate\t|" >> $EXT_LEX/final/${lexname}_ext.map |
427 | done < $EXT_LEX/tmp/$oov.phon | 423 | done < $EXT_LEX/tmp/$oov.phon |
428 | else | 424 | else |
429 | print_warn "[${BASENAME}] UNVALID OOV: $oov => no availaible Candidate word in LM" 2 | 425 | print_warn "[${BASENAME}] UNVALID OOV: $oov => no availaible Candidate word in LM" 2 |
430 | echo "$oov" >> $EXT_LEX/final/${lexname}.unvalid_oov | 426 | echo "$oov" >> $EXT_LEX/final/${lexname}.unvalid_oov |
431 | fi | 427 | fi |
432 | fi | 428 | fi |
433 | else | 429 | else |
434 | print_warn "[${BASENAME}] UNVALID OOV: $oov" 2 | 430 | print_warn "[${BASENAME}] UNVALID OOV: $oov" 2 |
435 | echo "$oov" >> $EXT_LEX/final/${lexname}.unvalid_oov | 431 | echo "$oov" >> $EXT_LEX/final/${lexname}.unvalid_oov |
436 | fi | 432 | fi |
437 | done < $EXT_LEX/final/all.oov_acousticlyfound | 433 | done < $EXT_LEX/final/all.oov_acousticlyfound |
438 | 434 | ||
439 | # | 435 | # |
440 | ### Speeral | 436 | ### Speeral |
441 | # | 437 | # |
442 | 438 | ||
443 | lexname=`basename $LEXICON` | 439 | lexname=`basename $LEXICON` |
444 | # | 440 | # |
445 | # Build the final trigger file | 441 | # Build the final trigger file |
446 | # | 442 | # |
447 | print_info "[${BASENAME}] Clean trigg files" 3 | 443 | print_info "[${BASENAME}] Clean trigg files" 3 |
448 | mkdir -p $TRIGGER_CONFZONE/speeral/ 2> /dev/null | 444 | mkdir -p $TRIGGER_CONFZONE/speeral/ 2> /dev/null |
449 | mkdir -p $EXT_LEX/speeral/ 2> /dev/null | 445 | mkdir -p $EXT_LEX/speeral/ 2> /dev/null |
450 | for i in `ls $TRIGGER_CONFZONE/*.trigg` | 446 | for i in `ls $TRIGGER_CONFZONE/*.trigg` |
451 | do | 447 | do |
452 | basename=`basename $i .trigg` | 448 | basename=`basename $i .trigg` |
453 | cat $i | $SCRIPT_PATH/RemoveLineContaining.pl $EXT_LEX/$lexname.unvalid_oov > $TRIGGER_CONFZONE/speeral/$basename.trigg | 449 | cat $i | $SCRIPT_PATH/RemoveLineContaining.pl $EXT_LEX/$lexname.unvalid_oov > $TRIGGER_CONFZONE/speeral/$basename.trigg |
454 | done | 450 | done |
455 | # | 451 | # |
456 | # Compile the speeral extended lexicon | 452 | # Compile the speeral extended lexicon |
457 | # | 453 | # |
458 | print_info "[${BASENAME}] Compile Speeral extended lexicon" 3 | 454 | print_info "[${BASENAME}] Compile Speeral extended lexicon" 3 |
459 | print_info "$SPEERAL_PATH/bin/buildmappedbinode $LEXICON.bdlex_phon $EXT_LEX/final/${lexname}_ext.map $AM_SKL $EXT_LEX/speeral/${lexname}_ext" 3 | 455 | print_info "$SPEERAL_PATH/bin/buildmappedbinode $LEXICON.bdlex_phon $EXT_LEX/final/${lexname}_ext.map $AM_SKL $EXT_LEX/speeral/${lexname}_ext" 3 |
460 | $SPEERAL_PATH/bin/buildmappedbinode $LEXICON.bdlex_phon $EXT_LEX/final/${lexname}_ext.map $AM_SKL $EXT_LEX/speeral/${lexname}_ext | 456 | $SPEERAL_PATH/bin/buildmappedbinode $LEXICON.bdlex_phon $EXT_LEX/final/${lexname}_ext.map $AM_SKL $EXT_LEX/speeral/${lexname}_ext |
461 | 457 | ||
462 | if [ $CHECK -eq 1 ] | 458 | if [ $CHECK -eq 1 ] |
463 | then | 459 | then |
464 | check_exploitconfpass_lex_check "${EXT_LEX}/speeral/${lexname}_ext" | 460 | check_exploitconfpass_lex_check "${EXT_LEX}/speeral/${lexname}_ext" |
465 | if [ $? -eq 1 ] | 461 | if [ $? -eq 1 ] |
466 | then | 462 | then |
467 | print_error "[${BASENAME}] Building Speeral Lexicon $INPUT_DIR -> exit" | 463 | print_error "[${BASENAME}] Building Speeral Lexicon $INPUT_DIR -> exit" |
468 | print_error "[${BASENAME}] Check $ERRORFILE" | 464 | print_error "[${BASENAME}] Check $ERRORFILE" |
469 | print_log_file $ERRORFILE "ERROR : Building Speeral Lexicon $INPUT_DIR" | 465 | print_log_file $ERRORFILE "ERROR : Building Speeral Lexicon $INPUT_DIR" |
470 | print_log_file $ERRORFILE "ERROR : ${EXT_LEX}/speeral/${lexname}_ext Empty after buildmappedbinode ?" | 466 | print_log_file $ERRORFILE "ERROR : ${EXT_LEX}/speeral/${lexname}_ext Empty after buildmappedbinode ?" |
471 | exit 1; | 467 | exit 1; |
472 | fi | 468 | fi |
473 | fi | 469 | fi |
474 | 470 | ||
475 | 471 | ||
476 | #-------# | 472 | #-------# |
477 | # CLOSE # | 473 | # CLOSE # |
478 | #-------# | 474 | #-------# |
479 | # Seem OK | 475 | # Seem OK |
480 | print_info "[${BASENAME}] <= ExploitConfidencePass End | $(date +'%d/%m/%y %H:%M:%S')" 1 | 476 | print_info "[${BASENAME}] <= ExploitConfidencePass End | $(date +'%d/%m/%y %H:%M:%S')" 1 |
481 | 477 | ||
482 | # unlok directory | 478 | # unlok directory |
483 | mv "$OUTPUT_DIR/EXPLOITCONFPASS.lock" "$OUTPUT_DIR/EXPLOITCONFPASS.unlock" | 479 | mv "$OUTPUT_DIR/EXPLOITCONFPASS.lock" "$OUTPUT_DIR/EXPLOITCONFPASS.unlock" |
484 | 480 | ||
485 | 481 | ||
486 | 482 |