Commit 5a14b86787e34200360c3f5d4ef4616d7fceccf3

Authored by Jean-François Rey
1 parent c00c2f1895
Exists in master

update doc

Showing 5 changed files with 60 additions and 14 deletions Inline Diff

1 #---------------# 1 #---------------#
2 # OTMEDIA LIA # 2 # OTMEDIA LIA #
3 # HOWTO # 3 # HOWTO #
4 # version 1.0 # 4 # version 1.0 #
5 #---------------# 5 #---------------#
6 6
7 1\ Main scripts options 7 1\ Main scripts options
8 ----------------------- 8 -----------------------
9 9
10 There are five main options for otmedia scripts. 10 There are five main options for otmedia scripts.
11 -h : for help 11 -h : for help
12 -D : Debug mode 12 -D : Debug mode
13 -v n : Verbose mode 1 low to 3 high 13 -v n : Verbose mode 1 low to 3 high
14 -c : Check results 14 -c : Check results
15 -r : force to rerun a script, without deleting work already done 15 -r : force to rerun a script, without deleting work already done
16 16
17 2\ Main scripts 17 2\ Main scripts
18 --------------- 18 ---------------
19 2.1\ FirstPass.sh 19 2.1\ FirstPass.sh
20 ----------------- 20 -----------------
21 21
22 FirstPass.sh do speaker diarization and transcription of an audio file. Convert it into wav format if not already done (16000Hz, 16 bits, mono). 22 FirstPass.sh do speaker diarization and transcription of an audio file. Convert it into wav format if not already done (16000Hz, 16 bits, mono).
23 If a .SRT file is present in the same directory of the audio file it will copy it. 23 If a .SRT file is present in the same directory of the audio file it will copy it.
24 24
25 $> FisrtPass.sh [options] 110624FR2_20002100.wav result_directory 25 $> FisrtPass.sh [options] 110624FR2_20002100.wav result_directory
26 26
27 Options: 27 Options:
28 -f n : number of forks for speeral 28 -f n : number of forks for speeral
29 29
30 Output : result_directory/110624FR2_20002100/res_p1/ 30 Output : result_directory/110624FR2_20002100/res_p1/
31 and .ctm, .trs and .txt files.
31 32
32 2.2\ SecondPass.sh 33 2.2\ SecondPass.sh
33 ------------------ 34 ------------------
34 35
35 SecondPass.sh do speaker adaptation and transcriptions base on the first pass. 36 SecondPass.sh do speaker adaptation and transcriptions base on the first pass.
36 37
37 $> SecondPass.sh [options] result_directory/110624FR2_20002100/ 38 $> SecondPass.sh [options] result_directory/110624FR2_20002100/
38 39
39 Options: 40 Options:
40 -f n : number of forks for speeral 41 -f n : number of forks for speeral
41 42
42 Output : result_directory/110624FR2_20002100/res_p2/ 43 Output : result_directory/110624FR2_20002100/res_p2/
44 and .ctm, .trs and .txt files.
43 45
44 2.3\ ConfPass.sh 46 2.3\ ConfPass.sh
45 ---------------- 47 ----------------
46 48
47 ConfPass.sh do confidence measure using the second or third pass. 49 ConfPass.sh do confidence measure using the second or third pass.
48 50
49 $> Confpass.sh [options] result_directory/110624FR2_20002100/ <res_p2|res_p3> 51 $> Confpass.sh [options] result_directory/110624FR2_20002100/ <res_p2|res_p3>
50 52
51 Output : result_directory/110624FR2_20002100/conf/res_p2/scored_ctm/ 53 Output : result_directory/110624FR2_20002100/conf/res_p2/scored_ctm/
52 and result_directory/110624FR2_20002100.usf file 54 and result_directory/110624FR2_20002100.usf file
53 55
54 2.4\ ExploitConfidencePass.sh 56 2.4\ ExploitConfidencePass.sh
55 ----------------------------- 57 -----------------------------
56 58
57 It exploits confidence pass measure to : 59 It exploits confidence pass measure to :
58 - boost confidente zone 60 - boost confidente zone
59 - find alternative in non confidente zone (using SOLR DB) 61 - find alternative in non confidente zone (using SOLR DB)
60 - extend the lexicon 62 - extend the lexicon
61 63
62 $> ExploitConfidencePass.sh [options] result_directory/110624FR2_20002100 64 $> ExploitConfidencePass.sh [options] result_directory/110624FR2_20002100
63 65
64 Output : result_directory/110624FR2_20002100/trigg/speeral 66 Output : result_directory/110624FR2_20002100/trigg/speeral
65 result_directory/110624FR2_20002100/LEX/speeral/_ext 67 result_directory/110624FR2_20002100/LEX/speeral/_ext
66 68
67 2.5\ ThirstPass.sh 69 2.5\ ThirstPass.sh
68 ------------------ 70 ------------------
69 71
70 ThirdPass.sh do transcriptions using SecondPass speaker adaptation and ExploitConfidencePass trigg files and new lexicon. 72 ThirdPass.sh do transcriptions using SecondPass speaker adaptation and ExploitConfidencePass trigg files and new lexicon.
71 73
72 $> ThirdPass.sh [options] result_directory/110624FR2_20002100/ 74 $> ThirdPass.sh [options] result_directory/110624FR2_20002100/
73 75
74 Options : 76 Options :
75 -f n : number of forks for speeral 77 -f n : number of forks for speeral
76 78
77 Output : result_directory/110624FR2_20002100/conf/res_p3 79 Output : result_directory/110624FR2_20002100/conf/res_p3
80 and .ctm, .trs and .txt files.
78 81
79 2.6\ RecomposePass.sh 82 2.6\ RecomposePass.sh
80 -------------------- 83 --------------------
81 84
82 RecomposePass.sh copy results that missing in ThirsPass from the Second and First Pass. 85 RecomposePass.sh copy results that missing in ThirsPass from the Second and First Pass.
83 86
84 $> RecomposePass.sh [options] result_directory/110624FR2_20002100/ 87 $> RecomposePass.sh [options] result_directory/110624FR2_20002100/
85 88
86 Output : result_directory/110624FR2_20002100/res_all 89 Output : result_directory/110624FR2_20002100/res_all
90 and .ctm, .trs and .txt files.
87 91
88 2.7\ ScoringRes.sh 92 2.7\ ScoringRes.sh
89 ------------------ 93 ------------------
90 94
91 ScoringRes.sh run differents scoring tools to score the results using SRT file if exists. 95 ScoringRes.sh run differents scoring tools to score the results using SRT file if exists.
92 96
93 $> ScoringRes.sh [options] result_directory/110624FR2_20002100/ 97 $> ScoringRes.sh [options] result_directory/110624FR2_20002100/
94 98
95 Output : result_directory/110624FR2_20002100/scoring 99 Output : result_directory/110624FR2_20002100/scoring
96 100
97 2.8\ CheckResults.sh 101 2.8\ CheckResults.sh
98 -------------------- 102 --------------------
99 103
100 CheckResults.sh parse results directories to synthesize works already done. 104 CheckResults.sh parse results directories to synthesize works already done.
101 105
102 $> CheckResults.sh [options] result_directory 106 $> CheckResults.sh [options] result_directory
103 107
104 Output : "Directory name #plp #res_p1 #treil_p2 #treil_p3 usf_p2 usf_p3" 108 Output : "Directory name #plp #res_p1 #treil_p2 #treil_p3 usf_p2 usf_p3"
105 #plp number of plp files 109 #plp number of plp files
106 #res_p1 number of .res files at first pass 110 #res_p1 number of .res files at first pass
107 #treil_p2 number of .treil files at second pass 111 #treil_p2 number of .treil files at second pass
108 #treil_p3 number of .treil files at third pass 112 #treil_p3 number of .treil files at third pass
109 usf_p2 usf file from confidence pass result on second pass (OK|ERR|NAN) 113 usf_p2 usf file from confidence pass result on second pass (OK|ERR|NAN)
110 usf_p3 usf file from confidence pass result on third pass (OK|ERR|NAN) 114 usf_p3 usf file from confidence pass result on third pass (OK|ERR|NAN)
111 115
112 3\ OneScriptToRuleThemAll.sh 116 3\ OneScriptToRuleThemAll.sh
113 ---------------------------- 117 ----------------------------
114 118
115 The script to do all OTMEDIA LIA pass in one call. 119 The script to do all OTMEDIA LIA pass in one call.
116 120
117 $> OneScriptToRuleThemAll.sh [options] 110624FR2_20002100.wav result_directory 121 $> OneScriptToRuleThemAll.sh [options] 110624FR2_20002100.wav result_directory
118 122
119 Options : (default options are availables) 123 Options : (default options are availables)
120 -a Do every pass 124 -a Do every pass
121 -1 Do First pass 125 -1 Do First pass
122 -2 Do Second pass 126 -2 Do Second pass
123 -3 Do Third pass 127 -3 Do Third pass
124 -C Do Confidence pass 128 -C Do Confidence pass
125 -e Do Exploit Confidence pass 129 -e Do Exploit Confidence pass
126 -R Do Recompose pass 130 -R Do Recompose pass
127 -s Do Scoring pass 131 -s Do Scoring pass
128 132
129 4\ Modify configuration 133 4\ Modify configuration
134 -----------------------
130 135
136 Most of the main scripts got a configuration file (cfg/ directory).
137 You can change script behaviour and data used.
138 Speeral configuration file can be also change (tools/Speeral/CFG/ directory)
139
131 4.1\ Scripts configurations 140 4.1\ Scripts configurations
141
142 In scripts configuration files (OTMEDIA_HOME/cfg/) you can change default options as architecture, verbose ...
143 Scripts using Speeral got information on binaries, models path and name, and the configuration file for speeral.
144
132 4.2\ Speeral configurations 145 4.2\ Speeral configurations
133 146
147 Speeral configuration files are in OTMEDIA_HOME/tools/Speeral/CFG directory.
148 The .tmp files are use to generate .xml file from install.sh.
149 You can modify .xml files for your needs, but most of data informations are pass through arguments at speeral call in scripts.
150
134 5\ Modify binaries 151 5\ Modify binaries
152 ------------------
153
154 Binaries can be find in bin and tools directory.
155 Some binaries are compiled in 32 and 64 bits. By default all binaries are compiled in 32 bits.
156 You can update binaries as you need.
157
158 To modify tools binaries, you need to download a compatible version.
159 lia_ltbox can be found in /labo/Tools/
160 Speeral (binaries) can be compiled from the git remote git@gitlia.univ-avignon.fr:vaudriguard/libspeeral.git . Do not modify Speeral data from OTMEDIA (unless you know what you do).
161 In PACKAGES_MESURES_V1.0 you can update icsiboost binary (in bin) from the projet page : https://code.google.com/p/icsiboost/
162 For QUOTE_FINDER and SIGMUND please contact support.
163
164 Good Luck ! Luke !
165 And the force be with you !
135 166
1 #---------------# 1 #---------------#
2 # OTMEDIA LIA # 2 # OTMEDIA LIA #
3 # INSTALL # 3 # INSTALL #
4 # version : 1.0 # 4 # version : 1.0 #
5 #---------------# 5 #---------------#
6 6
7 OTMEDIA LIA ready to use ? Really ? 7 OTMEDIA LIA ready to use ? Really ?
8 No ! You have to do manualy configuartion for some features. 8 No ! You have to do manualy configuartion for some features.
9 Let see... 9 Let see...
10 10
11 SUMMARY 11 SUMMARY
12 ------- 12 -------
13 13
14 1\ Before installation 14 1\ Before installation
15 2\ install.sh script 15 2\ install.sh script
16 3\ SOLR install 16 3\ SOLR install
17 4\ Install descriptions
17 18
18 19
19 1\ Before installation 20 1\ Before installation
20 ---------------------- 21 ----------------------
21 22
22 - Check and install dependencies. 23 - Check and install dependencies.
23 - In 64 bits architcture be sure you can run 32 bits programs. 24 - In 64 bits architcture be sure you can run 32 bits programs.
24 - Have 300 Go of free space. 25 - Have 300 Go of free space.
25 - Have acces to the network and the nyx server. 26 - Have acces to the network and the nyx server.
26 27
27 2/ install.sh script 28 2\ install.sh script
28 -------------------- 29 --------------------
29 30
30 install.sh script will do most of the work. 31 install.sh script will do most of the work.
31 It will check dependencies and configure pass tools. 32 It will check dependencies and configure pass tools.
32 By default it will do a complet install (300 Go). 33 By default it will do a complet install (300 Go).
33 34
34 You can modifiy behavior by editing install.sh : 35 You can modifiy behavior by editing install.sh :
35 36
36 To disable lexicon adaption using SOLR DB put EXPLOITCONFPASS to 0 (mainly the 290 Go). 37 To disable lexicon adaption using SOLR DB put EXPLOITCONFPASS to 0 (mainly the 290 Go).
37 To disable confidence measure put CONFPASS to 0. 38 To disable confidence measure put CONFPASS to 0.
38 To disable second and third pass put PASS2 to 0. 39 To disable second and third pass put PASS2 to 0.
39 40
41 If your login name differ from your nyx login name, edit install.sh and change the username variable to your nyx login name.
42
40 run install.sh and follow the white rabbit. 43 run install.sh and follow the white rabbit.
41 44
42 3\ SOLR install 45 3\ SOLR install
43 --------------- 46 ---------------
44 47
45 The install.sh script download otmedia-2013-04.tar.gz and untar it in OTMEDIA_HOME/tools/SOLR/ . 48 The install.sh script download otmedia-2013-04.tar.gz and untar it in OTMEDIA_HOME/tools/SOLR/ .
46 See SOLR.INSTALL file to install OTMEDIA SOLR DB. 49 See SOLR.INSTALL file to install OTMEDIA SOLR DB.
47 50
48 4\ Install descriptions 51 4\ Install descriptions
52 -----------------------
49 53
50 OTMEDIA_HOME 54 OTMEDIA_HOME
51 |-> bin/ 55 |-> bin/
52 |-> aff_mat 56 |-> aff_mat
53 |-> aff_mat.64 57 |-> aff_mat.64
54 |-> lia_plp_mt 58 |-> lia_plp_mt
55 |-> lia_plp_mt.64 59 |-> lia_plp_mt.64
56 |-> LIUM_SpkDiarization-4.2.jar 60 |-> LIUM_SpkDiarization-4.2.jar
57 |-> sclite 61 |-> sclite
58 |-> cfg/ 62 |-> cfg/ # Main scripts configurations files
59 |-> ConfidenceMeasure.cfg 63 |-> ConfidenceMeasure.cfg
60 |-> ConfPass.cfg 64 |-> ConfPass.cfg
61 |-> ExploitConfidencePass.cfg 65 |-> ExploitConfidencePass.cfg
62 |-> FirstPass.cfg 66 |-> FirstPass.cfg
63 |-> main_cfg.cfg 67 |-> main_cfg.cfg
64 |-> RecomposePass.cfg 68 |-> RecomposePass.cfg
65 |-> Scoring.cfg 69 |-> Scoring.cfg
66 |-> Secondass.cfg 70 |-> Secondass.cfg
67 |-> ThirdPass.cfg 71 |-> ThirdPass.cfg
68 |-> data/ 72 |-> data/ # Some data
69 |-> rules/ 73 |-> rules/
70 |-> asupp 74 |-> asupp
71 |-> basic 75 |-> basic
72 |-> lastprocess.regex 76 |-> lastprocess.regex
73 |-> muRules.tab 77 |-> muRules.tab
74 |-> numeric_rules 78 |-> numeric_rules
75 |-> postprocess.regex 79 |-> postprocess.regex
76 |-> preprocess.regex 80 |-> preprocess.regex
77 |-> random_regex.tab 81 |-> random_regex.tab
78 |-> main_tools/ 82 |-> main_tools/ # Main scripts
79 |-> CheckResults.sh 83 |-> CheckResults.sh
80 |-> ConfidenceMeasure.sh 84 |-> ConfidenceMeasure.sh
81 |-> ConfPass.sh 85 |-> ConfPass.sh
82 |-> ExploitConfidencePass.sh 86 |-> ExploitConfidencePass.sh
83 |-> FirstPass.sh 87 |-> FirstPass.sh
84 |-> OneScriptToRuleThemAll.sh 88 |-> OneScriptToRuleThemAll.sh
85 |-> RecomposePass.sh 89 |-> RecomposePass.sh
86 |-> ScoringRes.sh 90 |-> ScoringRes.sh
87 |-> SecondPass.sh 91 |-> SecondPass.sh
88 |-> ThirdPass.sh 92 |-> ThirdPass.sh
89 |-> tools/ 93 |-> tools/ # Tools
90 |-> lia_ltbox/ 94 |-> lia_ltbox/
91 |-> PACKAGE_MESURES_V1.0/ 95 |-> PACKAGE_MESURES_V1.0/
92 |-> QUOTE_FINDER/ 96 |-> QUOTE_FINDER/
93 |-> scripts/ 97 |-> scripts/ # Secondary scripts (but useful)
94 |-> ApplyCorrectionRules.pl 98 |-> ApplyCorrectionRules.pl
95 |-> BdlexUC.pl 99 |-> BdlexUC.pl
96 |-> CheckConfPass.sh 100 |-> CheckConfPass.sh
97 |-> CheckExploitConfPass.sh 101 |-> CheckExploitConfPass.sh
98 |-> CheckFirstPass.sh 102 |-> CheckFirstPass.sh
99 |-> CheckSecondPass.sh 103 |-> CheckSecondPass.sh
100 |-> CheckThirdPass.sh 104 |-> CheckThirdPass.sh
101 |-> CleanFilter.sh 105 |-> CleanFilter.sh
102 |-> CoverageReportMaker.pl 106 |-> CoverageReportMaker.pl
103 |-> ctm2show.pl 107 |-> ctm2show.pl
104 |-> Date2txt.pl 108 |-> Date2txt.pl
105 |-> daybefore2after.sh 109 |-> daybefore2after.sh
106 |-> ExtractAudioFromTV.sh 110 |-> ExtractAudioFromTV.sh
107 |-> FindNormRules.pl 111 |-> FindNormRules.pl
108 |-> formatRES.pl 112 |-> formatRES.pl
109 |-> GenerateSOLRQueries.pl 113 |-> GenerateSOLRQueries.pl
110 |-> intersec.pl 114 |-> intersec.pl
111 |-> KeepConfZone.pl 115 |-> KeepConfZone.pl
112 |-> LexPhonFilter.pl 116 |-> LexPhonFilter.pl
113 |-> MergeLexicon.pl 117 |-> MergeLexicon.pl
114 |-> NbMaxWordsFilter.pl 118 |-> NbMaxWordsFilter.pl
115 |-> Number2txt.pl 119 |-> Number2txt.pl
116 |-> perlmod/ 120 |-> perlmod/
117 |-> Utils.pm 121 |-> Utils.pm
118 |-> PhonFormatter.pl 122 |-> PhonFormatter.pl
119 |-> ProcessSOLRQueries.py 123 |-> ProcessSOLRQueries.py
120 |-> RandomRegex.pl 124 |-> RandomRegex.pl
121 |-> RemoveLineContaining.pl 125 |-> RemoveLineContaining.pl
122 |-> res2out.pl 126 |-> res2out.pl
123 |-> ScoreCtm2trigg.pl 127 |-> ScoreCtm2trigg.pl
124 |-> scoredCtmAndTaggedLem2All.pl 128 |-> scoredCtmAndTaggedLem2All.pl
125 |-> Sentencer.pl 129 |-> Sentencer.pl
126 |-> srt2stm.pl 130 |-> srt2stm.pl
127 |-> Tools.sh 131 |-> Tools.sh
128 |-> UrlConverter.pl 132 |-> UrlConverter.pl
129 |-> SIGMUND/ 133 |-> SIGMUND/
134 |-> SOLR/
135 |-> Speeral/
130 |-> COPYING 136 |-> COPYING
131 |-> CorpusOTMedia.txt 137 |-> CorpusOTMedia.txt
132 |-> HOWTO 138 |-> HOWTO
133 |-> INSTALL 139 |-> INSTALL
134 |-> README 140 |-> README
135 |-> SOLR.INSTALL 141 |-> SOLR.INSTALL
136 |-> TODO 142 |-> TODO
137 143
138 144
139 145
140 146
141 147
142 148
143 149
144 150
145 151
146 152
147 153
148 154
149 155
1 ___ _____ __ __ _____ ____ ___ _ _ ___ _ 1 ___ _____ __ __ _____ ____ ___ _ _ ___ _
2 / _ \_ _| \/ | ____| _ \_ _| / \ | | |_ _| / \ 2 / _ \_ _| \/ | ____| _ \_ _| / \ | | |_ _| / \
3 | | | || | | |\/| | _| | | | | | / _ \ | | | | / _ \ 3 | | | || | | |\/| | _| | | | | | / _ \ | | | | / _ \
4 | |_| || | | | | | |___| |_| | | / ___ \ | |___ | | / ___ \ 4 | |_| || | | | | | |___| |_| | | / ___ \ | |___ | | / ___ \
5 \___/ |_| |_| |_|_____|____/___/_/ \_\ |_____|___/_/ \_\ 5 \___/ |_| |_| |_|_____|____/___/_/ \_\ |_____|___/_/ \_\
6 6
7 7
8 #---------------# 8 #---------------#
9 # OTMEDIA LIA # 9 # OTMEDIA LIA #
10 # README # 10 # README #
11 # version 1.0 # 11 # version 1.0 #
12 #---------------# 12 #---------------#
13 13
14 DESCRIPTION 14 DESCRIPTION
15 ----------- 15 -----------
16 16
17 OTMEDIA means "Observatoire Transmedia", its main objective is to study the evolution and transformation of the media world. 17 OTMEDIA means "Observatoire Transmedia", its main objective is to study the evolution and transformation of the media world.
18 The scientific objective of the project is the creation of a new generation of media observatory 18 The scientific objective of the project is the creation of a new generation of media observatory
19 based on an interactive automatic analysis system (semi-automatic) transmedia to understand 19 based on an interactive automatic analysis system (semi-automatic) transmedia to understand
20 the world of information and developments. 20 the world of information and developments.
21 21
22 Web Site : http://www.otmedia.fr 22 Web Site : http://www.otmedia.fr
23 23
24 OTMEDIA LIA project is a set of tools to transcribe radio and TV shows. 24 OTMEDIA LIA project is a set of tools to transcribe radio and TV shows.
25 It does multiple things : 25 It does multiple things :
26 - First pass : default transcription with speeral and speaker diarization. 26 - First pass : default transcription with speeral and speaker diarization.
27 - Second pass : speaker adaptation and a second transcription pass with speeral. 27 - Second pass : speaker adaptation and a second transcription pass with speeral.
28 - Confidence pass : calcul confidence measure from transcription output. 28 - Confidence pass : calcul confidence measure from transcription output.
29 - Exploit Confidence Measure : use SOLR DB data to extend the lexicon on low confidence measure and create trigg files. 29 - Exploit Confidence Measure : use SOLR DB data to extend the lexicon on low confidence measure and create trigg files.
30 - Third pass : second pass using the new lexicon and trigg files. 30 - Third pass : second pass using the new lexicon and trigg files.
31 31
32 From GIT : git@gitlia.univ-avignon.fr/otmedia.git
32 33
33 DEPENDENCIES 34 DEPENDENCIES
34 ------------ 35 ------------
35 36
36 GNU Toolchain 37 GNU Toolchain
37 Available from : http://www.gnu.org 38 Available from : http://www.gnu.org
38 and debian packages 39 and debian packages
39 40
40 Compiling, linking, and building applications. 41 Compiling, linking, and building applications.
41 42
42 43
43 avconv (libav-tools >= 0.8) 44 avconv (libav-tools >= 0.8)
44 Available from : http://libav.org 45 Available from : http://libav.org
45 and debian package 46 and debian package
46 47
47 avconv is a very fast video and audio converter. 48 avconv is a very fast video and audio converter.
48 49
49 JAVA JDK and JRE ( >= 6) 50 JAVA JDK and JRE ( >= 6)
50 Available from : http://www.oralce.com 51 Available from : http://www.oralce.com
51 and debian packages 52 and debian packages
52 53
53 JAVA Developpment kit and JAVA runtime environment. 54 JAVA Developpment kit and JAVA runtime environment.
54 55
55 Python ( >= 2.7.0) 56 Python ( >= 2.7.0)
56 Available from : http://http://www.python.org/ 57 Available from : http://http://www.python.org/
57 and debian packages 58 and debian packages
58 59
59 Python is a programming language. 60 Python is a programming language.
60 61
61 Perl ( >= 5.0.0) 62 Perl ( >= 5.0.0)
62 Available from : http://www.perl.org/ 63 Available from : http://www.perl.org/
63 and debian packages 64 and debian packages
64 65
65 Perl is a programming language. 66 Perl is a programming language.
66 67
67 iconv ( >= 2.0.0) 68 iconv ( >= 2.0.0)
68 Available from : http://www.gnu.org 69 Available from : http://www.gnu.org
69 and debian package 70 and debian package
70 71
71 Character set conversion. 72 Character set conversion.
72 73
73 csh shell (csh) 74 csh shell (csh)
74 Available on debian packages. 75 Available on debian packages.
75 76
76 The C shell was originally written at UCB to overcome limitations in the 77 The C shell was originally written at UCB to overcome limitations in the
77 Bourne shell. Its flexibility and comfort (at that time) quickly made it 78 Bourne shell. Its flexibility and comfort (at that time) quickly made it
78 the shell of choice until more advanced shells like ksh, bash, zsh or 79 the shell of choice until more advanced shells like ksh, bash, zsh or
79 tcsh appeared. Most of the latter incorporate features original to csh 80 tcsh appeared. Most of the latter incorporate features original to csh
80 81
81 The SRI Language Modeling Toolkit (SRILM >= 1.6.0) 82 The SRI Language Modeling Toolkit (SRILM >= 1.6.0)
82 Available from : http://www.speech.sri.com/projects/srilm/ 83 Available from : http://www.speech.sri.com/projects/srilm/
83 84
84 SRILM is a toolkit for building and applying statistical language models. 85 SRILM is a toolkit for building and applying statistical language models.
85 86
86 Tomcat ( >= 7.0.0) 87 Tomcat ( >= 7.0.0)
87 Available from : http://tomcat.apache.org/ 88 Available from : http://tomcat.apache.org/
88 and debian packages 89 and debian packages
89 90
90 Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies. 91 Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies.
91 92
92 INSTALL 93 INSTALL
93 ------- 94 -------
94 95
95 See the INSTALL file for the installation procedure. 96 See the INSTALL file for the installation procedure.
96 97
97 Quick install below. 98 Quick install below.
98 99
99 Before launching installation : 100 Before launching installation :
100 101
101 Be certain that all dependencies are satisfied. 102 Be certain that all dependencies are satisfied.
102 Have 300 Go of free space for complet install. 103 Have 300 Go of free space for complet install.
103 104
104 Issue the following commands to the shell : 105 Issue the following commands to the shell :
105 $> ./install.sh 106 $> ./install.sh
106 $> export OTMEDIA_HOME=path/to/OTMEDIA/directory 107 $> export OTMEDIA_HOME=path/to/OTMEDIA/directory
107 108
108 Read SOLR.INSTALL part 3 to install SOLRDB. 109 Read SOLR.INSTALL part 3 to install SOLRDB.
109 110
110 RUNNING 111 RUNNING
111 ------- 112 -------
112 113
113 See HOWTO file. 114 See HOWTO file.
114 115
115 ACKNOWLEDGEMENTS 116 ACKNOWLEDGEMENTS
116 ---------------- 117 ----------------
117 118
118 Many thanks to Jean-François Rey for useful help and work done. 119 Many thanks to Jean-François Rey for useful help and work done.
119 120
120 KNOWN BUGS 121 KNOWN BUGS
121 ---------- 122 ----------
122 123
123 Many. 124 Many.
124 For Bug report, please contact Pascal Nocera at pascal.nocera@univ-avignon.fr 125 For Bug report, please contact Pascal Nocera at pascal.nocera@univ-avignon.fr
125 126
126 COPYRIGHT 127 COPYRIGHT
127 --------- 128 ---------
128 129
129 See the COPYING file. 130 See the COPYING file.
130 131
131 AUTHORS 132 AUTHORS
132 ------- 133 -------
133 134
134 Jean-François Rey <jean-francois.rey@univ-avignon.fr> 135 Jean-François Rey <jean-francois.rey@univ-avignon.fr>
135 Hugo Mauchrétien <hugo.mauchretien@univ-avignon.fr> 136 Hugo Mauchrétien <hugo.mauchretien@univ-avignon.fr>
136 Emmanuel Ferreira <emmanuel.ferreira@univ-avignon.fr> 137 Emmanuel Ferreira <emmanuel.ferreira@univ-avignon.fr>
137 138
138 139
1 #!/bin/bash 1 #!/bin/bash
2 2
3 #-------------------# 3 #-------------------#
4 # OTMEDIA LIA # 4 # OTMEDIA LIA #
5 # Install script # 5 # Install script #
6 # version : 1.0.0 # 6 # version : 1.1.0 #
7 #-------------------# 7 #-------------------#
8 8
9 # nyx login name
9 username=${LOGNAME} 10 username=${LOGNAME}
10 11
11 # Color variables 12 # Color variables
12 txtred=$(tput setaf 1) # red 13 txtred=$(tput setaf 1) # red
13 txtgrn=$(tput setaf 2) # Green 14 txtgrn=$(tput setaf 2) # Green
14 txtylw=$(tput setaf 3) # Yellow 15 txtylw=$(tput setaf 3) # Yellow
15 txtblu=$(tput setaf 4) # Blue 16 txtblu=$(tput setaf 4) # Blue
16 txtred=$(tput setaf 5) # Purple 17 txtred=$(tput setaf 5) # Purple
17 txtcyn=$(tput setaf 6) # Cyan 18 txtcyn=$(tput setaf 6) # Cyan
18 txtwht=$(tput setaf 7) # White 19 txtwht=$(tput setaf 7) # White
19 txtrst=$(tput sgr0) # Text reset. 20 txtrst=$(tput sgr0) # Text reset.
20 #/color 21 #/color
21 22
22 # 23 #
23 ### Global Variables 24 ### Global Variables
24 # 25 #
25 PWD=$(pwd) 26 PWD=$(pwd)
26 OTMEDIA_HOME=$PWD 27 OTMEDIA_HOME=$PWD
27 test=$(arch) 28 test=$(arch)
28 if [ "$test" == "x86_64" ]; then ARCH=".64"; else ARCH=""; fi 29 if [ "$test" == "x86_64" ]; then ARCH=".64"; else ARCH=""; fi
29 #/Global 30 #/Global
30 31
31 32
32 # 33 #
33 # Put to 0 to disable dependencies of a pass 34 # Put to 0 to disable dependencies of a pass
34 # and 1 to enable 35 # and 1 to enable
35 # 36 #
36 PASS1=1 # First Pass 37 PASS1=1 # First Pass
37 PASS2=1 # Second and Third Pass 38 PASS2=1 # Second and Third Pass
38 CONFPASS=1 # Confidence Pass 39 CONFPASS=1 # Confidence Pass
39 EXPLOITCONFPASS=1 # SOLR query and trigg 40 EXPLOITCONFPASS=1 # SOLR query and trigg
40 41
41 echo -e "\nWill do install for :" 42 echo -e "\nWill do install for :"
42 if [ $PASS1 -eq 1 ];then echo "- Pass 1";fi 43 if [ $PASS1 -eq 1 ];then echo "- Pass 1";fi
43 if [ $PASS2 -eq 1 ];then echo "- Pass 2";fi 44 if [ $PASS2 -eq 1 ];then echo "- Pass 2";fi
44 if [ $CONFPASS -eq 1 ];then echo "- Confidence Pass";fi 45 if [ $CONFPASS -eq 1 ];then echo "- Confidence Pass";fi
45 if [ $EXPLOITCONFPASS -eq 1 ];then echo "- Exploit Confidence Pass";fi 46 if [ $EXPLOITCONFPASS -eq 1 ];then echo "- Exploit Confidence Pass";fi
46 47
47 # 48 #
48 ### CHECK Dependencies ### 49 ### CHECK Dependencies ###
49 # 50 #
50 echo -e "\n\t${txtblu}Check Dependencies${txtrst}\n" 51 echo -e "\n\t${txtblu}Check Dependencies${txtrst}\n"
51 52
52 ## make 53 ## make
53 test=$(whereis make) 54 test=$(whereis make)
54 if [ "$test" == "make:" ] 55 if [ "$test" == "make:" ]
55 then 56 then
56 echo -e "${txtred}ERROR${txtrst} make not found\n You have to install make\n sudo apt-get install make" 57 echo -e "${txtred}ERROR${txtrst} make not found\n You have to install make\n sudo apt-get install make"
57 exit 1; 58 exit 1;
58 fi 59 fi
59 echo -e "make \t ${txtgrn}OK${txtrst}" 60 echo -e "make \t ${txtgrn}OK${txtrst}"
60 61
61 ## CC 62 ## CC
62 test=$(whereis cc) 63 test=$(whereis cc)
63 if [ "$test" == "cc:" ] 64 if [ "$test" == "cc:" ]
64 then 65 then
65 echo -e "${txtred}ERROR${txtrst} cc not found\n You have to install cc\n sudo apt-get install gcc" 66 echo -e "${txtred}ERROR${txtrst} cc not found\n You have to install cc\n sudo apt-get install gcc"
66 exit 1; 67 exit 1;
67 fi 68 fi
68 echo -e "cc \t ${txtgrn}OK${txtrst}" 69 echo -e "cc \t ${txtgrn}OK${txtrst}"
69 70
70 ## AVCONV 71 ## AVCONV
71 test=$(whereis avconv) 72 test=$(whereis avconv)
72 if [ "$test" == "avconv:" ] 73 if [ "$test" == "avconv:" ]
73 then 74 then
74 echo -e "${txtred}ERROR${txtrst} avconv not found\n You have to install avconv\n sudo apt-get install libav-tools" 75 echo -e "${txtred}ERROR${txtrst} avconv not found\n You have to install avconv\n sudo apt-get install libav-tools"
75 exit 1; 76 exit 1;
76 fi 77 fi
77 echo -e "libav-tools : avconv \t ${txtgrn}OK${txtrst}" 78 echo -e "libav-tools : avconv \t ${txtgrn}OK${txtrst}"
78 79
79 ## JAVA 80 ## JAVA
80 test=$(whereis java) 81 test=$(whereis java)
81 if [ "$test" == "java:" ] 82 if [ "$test" == "java:" ]
82 then 83 then
83 echo -e "${txtred}ERROR${txtrst} java not found\n You have to install java JRE\n sudo apt-get install openjdk-7-jre" 84 echo -e "${txtred}ERROR${txtrst} java not found\n You have to install java JRE\n sudo apt-get install openjdk-7-jre"
84 exit 1; 85 exit 1;
85 fi 86 fi
86 echo -e "Java : JRE \t ${txtgrn}OK${txtrst}" 87 echo -e "Java : JRE \t ${txtgrn}OK${txtrst}"
87 test=$(whereis javac) 88 test=$(whereis javac)
88 if [ "$test" == "javac:" ] 89 if [ "$test" == "javac:" ]
89 then 90 then
90 echo -e "${txtred}ERROR${txtrst} javac not found\n You have to install java JDK\n sudo apt-get install openjdk-7-jdk" 91 echo -e "${txtred}ERROR${txtrst} javac not found\n You have to install java JDK\n sudo apt-get install openjdk-7-jdk"
91 exit 1; 92 exit 1;
92 fi 93 fi
93 echo -e "Java : JDK \t ${txtgrn}OK${txtrst}" 94 echo -e "Java : JDK \t ${txtgrn}OK${txtrst}"
94 95
95 if [ $EXPLOITCONFPASS -eq 1 ] 96 if [ $EXPLOITCONFPASS -eq 1 ]
96 then 97 then
97 ## Python 98 ## Python
98 test=$(whereis python) 99 test=$(whereis python)
99 if [ "$test" == "python:" ] 100 if [ "$test" == "python:" ]
100 then 101 then
101 echo -e "${txtred}ERROR${txtrst} python not found\n You have to install python\n sudo apt-get install python" 102 echo -e "${txtred}ERROR${txtrst} python not found\n You have to install python\n sudo apt-get install python"
102 exit 1; 103 exit 1;
103 fi 104 fi
104 echo -e "python : \t ${txtgrn}OK${txtrst}" 105 echo -e "python : \t ${txtgrn}OK${txtrst}"
105 106
106 ## csh shell 107 ## csh shell
107 test=$(whereis csh) 108 test=$(whereis csh)
108 if [ "$test" == "csh:" ] 109 if [ "$test" == "csh:" ]
109 then 110 then
110 echo -e "${txtred}ERROR${txtrst} csh shell not found\n You have to install csh shell\n sudo apt-get install csh" 111 echo -e "${txtred}ERROR${txtrst} csh shell not found\n You have to install csh shell\n sudo apt-get install csh"
111 exit 1; 112 exit 1;
112 fi 113 fi
113 echo -e "csh shell : \t ${txtgrn}OK${txtrst}" 114 echo -e "csh shell : \t ${txtgrn}OK${txtrst}"
114 fi 115 fi
115 116
116 ## Perl 117 ## Perl
117 test=$(whereis perl) 118 test=$(whereis perl)
118 if [ "$test" == "perl:" ] 119 if [ "$test" == "perl:" ]
119 then 120 then
120 echo -e "${txtred}ERROR${txtrst} perl not found\n You have to install perl\n sudo apt-get install perl" 121 echo -e "${txtred}ERROR${txtrst} perl not found\n You have to install perl\n sudo apt-get install perl"
121 exit 1; 122 exit 1;
122 fi 123 fi
123 echo -e "perl : \t ${txtgrn}OK${txtrst}" 124 echo -e "perl : \t ${txtgrn}OK${txtrst}"
124 125
125 ## iconv 126 ## iconv
126 test=$(whereis iconv) 127 test=$(whereis iconv)
127 if [ "$test" == "iconv:" ] 128 if [ "$test" == "iconv:" ]
128 then 129 then
129 echo -e "${txtred}ERROR${txtrst} iconv not found\n You have to install iconv\n sudo apt-cache search iconv" 130 echo -e "${txtred}ERROR${txtrst} iconv not found\n You have to install iconv\n sudo apt-cache search iconv"
130 exit 1; 131 exit 1;
131 fi 132 fi
132 echo -e "iconv : \t ${txtgrn}OK${txtrst}" 133 echo -e "iconv : \t ${txtgrn}OK${txtrst}"
133 134
134 ## SRI LM 135 ## SRI LM
135 if [ -z "$SRILM" ] && [ -z "$MACHINE_TYPE" ] 136 if [ -z "$SRILM" ] && [ -z "$MACHINE_TYPE" ]
136 then 137 then
137 echo -e "${txtred}ERROR${txtrst} SRILM toolkit variables are not defined (SRILM and MACHINE_TYPE)\n You have to install SRILM Toolkit\n" 138 echo -e "${txtred}ERROR${txtrst} SRILM toolkit variables are not defined (SRILM and MACHINE_TYPE)\n You have to install SRILM Toolkit\n"
138 exit 1; 139 exit 1;
139 fi 140 fi
140 export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE 141 export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE
141 echo -e "SRILM toolkit : \t ${txtgrn}OK${txtrst}" 142 echo -e "SRILM toolkit : \t ${txtgrn}OK${txtrst}"
142 143
143 ### Speeral Configuration ### 144 ### Speeral Configuration ###
144 145
145 echo -e "\n\t${txtblu}Speeral configuration${txtrst}\n" 146 echo -e "\n\t${txtblu}Speeral configuration${txtrst}\n"
146 echo -e "Download Speeral bin and data :" 147 read -e -p "Download Speeral bin and data ? (y/n) " speeral
147 scp -r ${username}@nyx:~/OTMEDIA_DATA/Speeral $OTMEDIA_HOME/tools/ 148 if [ "$speeral" == "y" ]
149 then
150 echo -e "Download Speeral bin and data :"
151 scp -r ${username}@nyx:/local/OTMEDIA/OTMEDIA_DATA/Speeral $OTMEDIA_HOME/tools/
152 fi
148 echo -e "\n\t${txtblu}Generating Speeral configuration files :${txtrst}\n" 153 echo -e "\n\t${txtblu}Generating Speeral configuration files :${txtrst}\n"
149 cat $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ 154 cat $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \
150 | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ 155 | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \
151 | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ 156 | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \
152 > $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml 157 > $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml
153 echo $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml 158 echo $PWD/tools/Speeral/CFG/SpeeralFirstPass.xml
154 cat $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ 159 cat $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \
155 | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ 160 | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \
156 | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ 161 | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \
157 > $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml 162 > $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml
158 echo $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml 163 echo $PWD/tools/Speeral/CFG/SpeeralSecondPass.xml
159 cat $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \ 164 cat $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml.tmp | sed -e "s|<nom>[^<]*</nom>|<nom>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer</nom>|g" \
160 | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \ 165 | sed -e "s|<ngramme>[^<]*</ngramme>|<ngramme>$PWD/tools/Speeral/LM/ML_4gOTMEDIA_LEXIQUE_V6</ngramme>|g" \
161 | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \ 166 | sed -e "s|<binode>[^<]*</binode>|<binode>$PWD/tools/Speeral/LEX/LEXIQUE_V6.speer.bin</binode>|g" \
162 > $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml 167 > $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml
163 echo $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml 168 echo $PWD/tools/Speeral/CFG/SpeeralThirdPass.xml
164 169
165 170
166 if [ $EXPLOITCONFPASS -eq 1 ] 171 if [ $EXPLOITCONFPASS -eq 1 ]
167 then 172 then
168 ### LIA ltbox ### 173 ### LIA ltbox ###
169 echo -e "\t${txtblu}Install lia_ltbox${txtrst}\n" 174 echo -e "\t${txtblu}Install lia_ltbox${txtrst}\n"
170 export LIA_TAGG_LANG="french" 175 export LIA_TAGG_LANG="french"
171 export LIA_TAGG="$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" 176 export LIA_TAGG="$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/"
172 export LIA_PHON_REP="$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" 177 export LIA_PHON_REP="$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/"
173 export LIA_BIGLEX="$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" 178 export LIA_BIGLEX="$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/"
174 179
175 ### config lia_phon 180 ### config lia_phon
176 cd $LIA_PHON_REP 181 cd $LIA_PHON_REP
177 make all > /dev/null 182 make all > /dev/null
178 make ressource > /dev/null 183 make ressource > /dev/null
179 ### config lia_tagg 184 ### config lia_tagg
180 cd $LIA_TAGG 185 cd $LIA_TAGG
181 make all > /dev/null 186 make all > /dev/null
182 make ressource.french > /dev/null 187 make ressource.french > /dev/null
183 ### config lia_biglex 188 ### config lia_biglex
184 cd $LIA_BIGLEX 189 cd $LIA_BIGLEX
185 make -f makefile.biglex > /dev/null 190 make -f makefile.biglex > /dev/null
186 cd $OTMEDIA_HOME 191 cd $OTMEDIA_HOME
187 192
188 193
189 ### SOLR DB ### 194 ### SOLR DB ###
190 # Tomcat fisrtly 195 # Tomcat fisrtly
191 test=$(dpkg -l | grep "^ii" | grep tomcat) 196 test=$(dpkg -l | grep "^ii" | grep tomcat)
192 if [ "$test" == "" ] 197 if [ "$test" == "" ]
193 then 198 then
194 echo -e "${txtred}ERROR${txtrst} TOMCAT seems to not be installed)\n You have to install TOMCAT\n" 199 echo -e "${txtred}ERROR${txtrst} TOMCAT seems to not be installed)\n You have to install TOMCAT\n"
195 #exit 1; 200 #exit 1;
196 fi 201 fi
197 echo -e "\nTOMCAT : \t ${txtgrn}OK${txtrst}\n" 202 echo -e "\nTOMCAT : \t ${txtgrn}OK${txtrst}\n"
198 # SOLR secondly 203 # SOLR secondly
199 echo -e "\t${txtblu}Install SOLR DB${txtrst}\n" 204 echo -e "\t${txtblu}Install SOLR DB${txtrst}\n"
200 echo -e "You will need 300 Go of free space to install SOLR DB" 205 echo -e "You will need 300 Go of free space to install SOLR DB"
201 read -p "Continue ? (y/n) " solr 206 read -p "Continue ? (y/n) " solr
202 if [ "$solr" == "y" ] 207 if [ "$solr" == "y" ]
203 then 208 then
204 209
205 echo -e "Download SOLR DB\r" 210 echo -e "Download SOLR DB\r"
206 mkdir -p $OTMEDIA_HOME/tools/SOLR 2> /dev/null 211 mkdir -p $OTMEDIA_HOME/tools/SOLR 2> /dev/null
207 scp -r ${username}@nyx:~/OTMEDIA_DATA/SOLR/otmedia-2013-04.tar.gz $OTMEDIA_HOME/tools/SOLR 212 scp -r ${username}@nyx:/local/OTMEDIA/OTMEDIA_DATA/SOLR/otmedia-2013-04.tar.gz $OTMEDIA_HOME/tools/SOLR
208 echo -e "Unzip SOLR DB\r" 213 echo -e "Unzip SOLR DB\r"
209 res=0 214 res=0
210 #res = $(tar -xvzf "$OTMEDIA_HOME/tools/SOLR/otmedia-2013-04.tar.gz" "$OTMEDIA_HOME/tools/SOLR/") 215 #res = $(tar -xvzf "$OTMEDIA_HOME/tools/SOLR/otmedia-2013-04.tar.gz" "$OTMEDIA_HOME/tools/SOLR/")
211 if [ $res -eq 2 ]; then echo " ${txtred}NOT OK${txtrst}"; 216 if [ $res -eq 2 ]; then echo " ${txtred}NOT OK${txtrst}";
212 else echo " ${txtgrn}OK${txtrst}"; fi 217 else echo " ${txtgrn}OK${txtrst}"; fi
213 else 218 else
214 echo "Skipping SOLR install" 219 echo "Skipping SOLR install"
215 fi 220 fi
216 read -e -p "Configure SOLR DB server ? (y/n) " solr 221 read -e -p "Configure SOLR DB server ? (y/n) " solr
217 if [ "$solr" == "y" ] 222 if [ "$solr" == "y" ]
218 then 223 then
219 read -p "Enter SOLR server IP :" ip 224 read -p "Enter SOLR server IP :" ip
220 if [ "${ip}" == "" ];then ip="localhost";fi 225 if [ "${ip}" == "" ];then ip="localhost";fi
221 echo "machine = \"${ip}\"" > $OTMEDIA_HOME/tools/scripts/solrinfo.py 226 echo "machine = \"${ip}\"" > $OTMEDIA_HOME/tools/scripts/solrinfo.py
222 read -p "Enter SOLR server port :" port 227 read -p "Enter SOLR server port :" port
223 if [ "${port}" == "" ]; then port="8080";fi 228 if [ "${port}" == "" ]; then port="8080";fi
224 echo -e "\n\tSOLR server IP ${ip}" 229 echo -e "\n\tSOLR server IP ${ip}"
225 echo -e "\tSOLR server port ${port}" 230 echo -e "\tSOLR server port ${port}"
226 echo "port = \"${port}\"" >> $OTMEDIA_HOME/tools/scripts/solrinfo.py 231 echo "port = \"${port}\"" >> $OTMEDIA_HOME/tools/scripts/solrinfo.py
227 else 232 else
228 echo "Skipping SOLR DB Configuration" 233 echo "Skipping SOLR DB Configuration"
229 fi 234 fi
230 echo -e "\nSee SOLR.INSTALL file for more information\n" 235 echo -e "\nSee SOLR.INSTALL file for more information\n"
231 fi 236 fi
232 237
233 ### Set Variables in bashrc ### 238 ### Set Variables in bashrc ###
234 cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" > ~/.bashrc.org 239 cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" > ~/.bashrc.org
235 #cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" | grep -v "LIA_TAGG" | grep -v "LIA_PHON" | grep -v "LIA_BIGLEX" > ~/.bashrc.org 240 #cat ~/.bashrc | grep -v "OTMEDIA_HOME" | grep -v "SRILM_BIN" | grep -v "LIA_TAGG" | grep -v "LIA_PHON" | grep -v "LIA_BIGLEX" > ~/.bashrc.org
236 cp ~/.bashrc.org ~/.bashrc 241 cp ~/.bashrc.org ~/.bashrc
237 export OTMEDIA_HOME=$PWD 242 export OTMEDIA_HOME=$PWD
238 echo "export OTMEDIA_HOME=$PWD" >> ~/.bashrc 243 echo "export OTMEDIA_HOME=$PWD" >> ~/.bashrc
239 echo "export $PATH=$PATH:$PWD/main_tools" >> ~/.bashrc 244 echo "export $PATH=$PATH:$PWD/main_tools" >> ~/.bashrc
240 echo "export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE" >> ~/.bashrc 245 echo "export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE" >> ~/.bashrc
241 #echo "export LIA_TAGG_LANG=french" >> ~/.bashrc 246 #echo "export LIA_TAGG_LANG=french" >> ~/.bashrc
242 #echo "export LIA_TAGG=$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" >> ~/.bashrc 247 #echo "export LIA_TAGG=$OTMEDIA_HOME/tools/lia_ltbox/lia_tagg/" >> ~/.bashrc
243 #echo "export LIA_PHON_REP=$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" >> ~/.bashrc 248 #echo "export LIA_PHON_REP=$OTMEDIA_HOME/tools/lia_ltbox/lia_phon/" >> ~/.bashrc
244 #echo "export LIA_BIGLEX=$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" >> ~/.bashrc 249 #echo "export LIA_BIGLEX=$OTMEDIA_HOME/tools/lia_ltbox/lia_biglex/" >> ~/.bashrc
245 250
246 # set global configuration file 251 # set global configuration file
247 echo "OTMEDIA_HOME=$PWD" > $OTMEDIA_HOME/cfg/main_cfg.cfg 252 echo "OTMEDIA_HOME=$PWD" > $OTMEDIA_HOME/cfg/main_cfg.cfg
248 echo "ARCH=$ARCH" >> $OTMEDIA_HOME/cfg/main_cfg.cfg 253 echo "ARCH=$ARCH" >> $OTMEDIA_HOME/cfg/main_cfg.cfg
249 echo "VERBOSE=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg 254 echo "VERBOSE=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg
250 echo "DEBUG=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg 255 echo "DEBUG=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg
251 echo "CHECK=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg 256 echo "CHECK=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg
252 echo "RERUN=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg 257 echo "RERUN=0" >> $OTMEDIA_HOME/cfg/main_cfg.cfg
253 258
254 echo -e "\n\t${txtgrn}### Install completed ###${txtrst}\n" 259 echo -e "\n\t${txtgrn}### Install completed ###${txtrst}\n"
255 echo -e "do : source ~/.bashrc" 260 echo -e "do : source ~/.bashrc"
256 echo -e "or set variable :\n" 261 echo -e "or set variable :\n"
257 echo "export OTMEDIA_HOME=$PWD" 262 echo "export OTMEDIA_HOME=$PWD"
258 echo "export PATH=\$PATH:$OTMEDIA_HOME/main_tools" 263 echo "export PATH=\$PATH:$OTMEDIA_HOME/main_tools"
259 echo "export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE" 264 echo "export SRILM_BIN=$SRILM/bin/$MACHINE_TYPE"
260 265
261 266
262 echo " \\\\ " 267 echo "${txtwht} \\\\ "
263 echo " ,-~~~-\\\\_" 268 echo " ,-~~~-\\\\_"
264 echo " ( .\ " 269 echo " ( .\ "
265 echo " @\___(__--'" 270 echo " @\___(__--'${txtrst}"
266 271
267 echo "${txtgrn}Yes${txtylw}I${txtred}Rastafari${txtrst}" 272 echo "${txtgrn}Yes${txtylw}I${txtred}Rastafari${txtrst}"
268 273
main_tools/ConfidenceMeasure.sh
1 #!/bin/bash 1 #!/bin/bash
2 #----------------------------------------------------------------------------------------- 2 #-----------------------------------------------------------------------------------------
3 # Author : Benjamin Lecouteux & Emmanuel FERREIRA (contact emmanuel.ferreira0194@gmail.com) 3 # Author : Benjamin Lecouteux & Emmanuel FERREIRA (contact emmanuel.ferreira0194@gmail.com)
4 # Brief: Determine les mesures de confiance d'une transcription (res de speeral) 4 # Brief: Determine les mesures de confiance d'une transcription (res de speeral)
5 #----------------------------------------------------------------------------------------- 5 #-----------------------------------------------------------------------------------------
6 6
7 # where is ConfidenceMeasure.sh 7 # where is ConfidenceMeasure.sh
8 if [ -z $MAIN_SCRIPT_PATH ]; then MAIN_SCRIPT_PATH=$(dirname $(readlink -e $0)); fi 8 if [ -z $MAIN_SCRIPT_PATH ]; then MAIN_SCRIPT_PATH=$(dirname $(readlink -e $0)); fi
9 9
10 # where is ConfidenceMeasure.cfg 10 # where is ConfidenceMeasure.cfg
11 CONFIDENCEMEASURE_CONFIG_FILE=$OTMEDIA_HOME"/cfg/ConfidenceMeasure.cfg" 11 CONFIDENCEMEASURE_CONFIG_FILE=$OTMEDIA_HOME"/cfg/ConfidenceMeasure.cfg"
12 if [ -e $CONFIDENCEMEASURE_CONFIG_FILE ] 12 if [ -e $CONFIDENCEMEASURE_CONFIG_FILE ]
13 then 13 then
14 . $CONFIDENCEMEASURE_CONFIG_FILE 14 . $CONFIDENCEMEASURE_CONFIG_FILE
15 else 15 else
16 echo "ERROR : Can't find configuration file $CONFIDENCEMEASURE_CONFIG_FILE" >&2 16 echo "ERROR : Can't find configuration file $CONFIDENCEMEASURE_CONFIG_FILE" >&2
17 exit 1 17 exit 1
18 fi 18 fi
19 19
20 PACKAGE_CONF_MEASURE=$CONFIDENCEMEASURE_CONFIG_FILE 20 PACKAGE_CONF_MEASURE=$CONFIDENCEMEASURE_CONFIG_FILE
21 #------------------ 21 #------------------
22 # Parser les options 22 # Parser les options
23 #------------------- 23 #-------------------
24 while getopts ":c:s:h" OPTION 24 while getopts ":c:s:h" OPTION
25 do 25 do
26 case $OPTION in 26 case $OPTION in
27 h) #Display help 27 h) #Display help
28 echo -e "$0 :" 28 echo -e "$0 :"
29 echo -e "\tAuthor : Benjamin Lecouteux & Emmanuel FERREIRA (contact: emmanuel.ferreira0194@gmail.com)" 29 echo -e "\tAuthor : Benjamin Lecouteux & Emmanuel FERREIRA (contact: emmanuel.ferreira0194@gmail.com)"
30 echo -e "\tVersion : 2.0" 30 echo -e "\tVersion : 2.0"
31 echo -e "\tBrief : Determine confidence measure of a transcription" 31 echo -e "\tBrief : Determine confidence measure of a transcription"
32 echo -e "\tUsage : $0 [OPTIONS] <(i) REP_IN> <REP_NAME>" 32 echo -e "\tUsage : $0 [OPTIONS] <(i) REP_IN> <REP_NAME>"
33 echo -e "\tOptions:" 33 echo -e "\tOptions:"
34 echo -e "\t\tc) specify the path of the configuration file (default $PACKAGE_CONF_MEASURE)" 34 echo -e "\t\tc) specify the path of the configuration file (default $PACKAGE_CONF_MEASURE)"
35 echo -e "\t\ts) specify PORT@HOST of a SRILM server" 35 echo -e "\t\ts) specify PORT@HOST of a SRILM server"
36 exit 1 36 exit 1
37 ;; 37 ;;
38 c) #Change the configuration file 38 c) #Change the configuration file
39 PACKAGE_CONF_MEASURE=$OPTARG 39 PACKAGE_CONF_MEASURE=$OPTARG
40 ;; 40 ;;
41 s) #use an SRILM server (avoid loading arpa model in memory) 41 s) #use an SRILM server (avoid loading arpa model in memory)
42 SERVER=$OPTARG 42 SERVER=$OPTARG
43 ;; 43 ;;
44 :) 44 :)
45 echo "BAD USAGE : OPTION $OPTARG need a value" 45 echo "BAD USAGE : OPTION $OPTARG need a value"
46 exit 1 46 exit 1
47 ;; 47 ;;
48 \?) 48 \?)
49 echo "BAD USAGE : unknow option '$OPTARG'" 49 echo "BAD USAGE : unknow option '$OPTARG'"
50 exit 1 50 exit 1
51 ;; 51 ;;
52 esac 52 esac
53 done 53 done
54 54
55 #------------------------------------------- 55 #-------------------------------------------
56 # Shift options pour recuperation arguments 56 # Shift options pour recuperation arguments
57 #------------------------------------------- 57 #-------------------------------------------
58 shift $((OPTIND-1)) 58 shift $((OPTIND-1))
59 59
60 if [ -z "$1" ] 60 if [ -z "$1" ]
61 then 61 then
62 echo "BAD USAGE: $0 [OPTIONS] <(i) repertoire (ex:20041006_0800_0900_CULTURE)> <REP_NAME (ex:res_p2)>" 62 echo "BAD USAGE: $0 [OPTIONS] <(i) repertoire (ex:20041006_0800_0900_CULTURE)> <REP_NAME (ex:res_p2)>"
63 exit 1 63 exit 1
64 fi 64 fi
65 65
66 if [ -z "$2" ] 66 if [ -z "$2" ]
67 then 67 then
68 echo "BAD USAGE: $0 [OPTIONS] <(i) repertoire (ex:20041006_0800_0900_CULTURE)> <REP_NAME (ex:res_p2)>" 68 echo "BAD USAGE: $0 [OPTIONS] <(i) repertoire (ex:20041006_0800_0900_CULTURE)> <REP_NAME (ex:res_p2)>"
69 exit 1 69 exit 1
70 fi 70 fi
71 71
72 . $PACKAGE_CONF_MEASURE 72 . $PACKAGE_CONF_MEASURE
73 73
74 #------------------------------------ 74 #------------------------------------
75 # INIT - Creation du workspace 75 # INIT - Creation du workspace
76 #------------------------------------ 76 #------------------------------------
77 NAME=`basename $1` 77 NAME=`basename $1`
78 CONF_DIR=$1/conf/$2 78 CONF_DIR=$1/conf/$2
79 FICHIER_RES=$2 79 FICHIER_RES=$2
80 REF=$CONF_DIR/ref 80 REF=$CONF_DIR/ref
81 POS=$CONF_DIR/pos 81 POS=$CONF_DIR/pos
82 MLCLASS=$CONF_DIR/mlclass 82 MLCLASS=$CONF_DIR/mlclass
83 GVALIGN=$CONF_DIR/gvalign 83 GVALIGN=$CONF_DIR/gvalign
84 HTK_POST=$CONF_DIR/htk_post 84 HTK_POST=$CONF_DIR/htk_post
85 HTK_LM=$CONF_DIR/htk_lm 85 HTK_LM=$CONF_DIR/htk_lm
86 WLAT=$CONF_DIR/wlat 86 WLAT=$CONF_DIR/wlat
87 LIKELIHOOD=$CONF_DIR/likelihood 87 LIKELIHOOD=$CONF_DIR/likelihood
88 GVCTM=$CONF_DIR/gvctm 88 GVCTM=$CONF_DIR/gvctm
89 SEGCTM=$CONF_DIR/segctm 89 SEGCTM=$CONF_DIR/segctm
90 SUPER_CTM=$CONF_DIR/super_ctm 90 SUPER_CTM=$CONF_DIR/super_ctm
91 SCORED_CTM=$CONF_DIR/scored_ctm 91 SCORED_CTM=$CONF_DIR/scored_ctm
92 mkdir -p $CONF_DIR > /dev/null 2>&1 92 mkdir -p $CONF_DIR > /dev/null 2>&1
93 mkdir -p $REF > /dev/null 2>&1 93 mkdir -p $REF > /dev/null 2>&1
94 mkdir -p $POS > /dev/null 2>&1 94 mkdir -p $POS > /dev/null 2>&1
95 mkdir -p $MLCLASS > /dev/null 2>&1 95 mkdir -p $MLCLASS > /dev/null 2>&1
96 mkdir -p $GVALIGN > /dev/null 2>&1 96 mkdir -p $GVALIGN > /dev/null 2>&1
97 mkdir -p $HTK_POST > /dev/null 2>&1 97 mkdir -p $HTK_POST > /dev/null 2>&1
98 #mkdir -p $HTK_LM ==> generer auto par SRILM si besoin 98 #mkdir -p $HTK_LM ==> generer auto par SRILM si besoin
99 mkdir -p $WLAT > /dev/null 2>&1 99 mkdir -p $WLAT > /dev/null 2>&1
100 mkdir -p $LIKELIHOOD > /dev/null 2>&1 100 mkdir -p $LIKELIHOOD > /dev/null 2>&1
101 mkdir -p $GVCTM > /dev/null 2>&1 101 mkdir -p $GVCTM > /dev/null 2>&1
102 mkdir -p $SEGCTM > /dev/null 2>&1 102 mkdir -p $SEGCTM > /dev/null 2>&1
103 mkdir -p $SUPER_CTM > /dev/null 2>&1 103 mkdir -p $SUPER_CTM > /dev/null 2>&1
104 mkdir -p $SCORED_CTM > /dev/null 2>&1 104 mkdir -p $SCORED_CTM > /dev/null 2>&1
105 if [ -z $BOOST_BIN ];then 105 if [ -z $BOOST_BIN ] && [ $ARCH == ".64"] ;then
106 BOOST_BIN=$ROOT/bin/icsiboost-64bit-static-r160 106 BOOST_BIN=$ROOT/bin/icsiboost-64bit-static-r160
107 fi
108 if [ -z $BOOST_BIN ] ;then
109 BOOST_BIN=$ROOT/bin/icsiboost-32bit-static-r176
107 fi 110 fi
108 #----------------------------------------------------------------- 111 #-----------------------------------------------------------------
109 # STEP 1 - Extension des treillis + ajout posteriors (format htk) 112 # STEP 1 - Extension des treillis + ajout posteriors (format htk)
110 #----------------------------------------------------------------- 113 #-----------------------------------------------------------------
111 if [ $EXTEND == 1 ] 114 if [ $EXTEND == 1 ]
112 then 115 then
113 echo "EXTEND step..." 116 echo "EXTEND step..."
114 rm -r $HTK_LM > /dev/null 2>&1 117 rm -r $HTK_LM > /dev/null 2>&1
115 rm $HTK_POST/* > /dev/null 2>&1 118 rm $HTK_POST/* > /dev/null 2>&1
116 # 119 #
117 # --> Ajout des scores linguistiques dans le HTK 120 # --> Ajout des scores linguistiques dans le HTK
118 # 121 #
119 ls $1/$FICHIER_RES/*.treil > $CONF_DIR/Liste_treil_${NAME}.lst 122 ls $1/$FICHIER_RES/*.treil > $CONF_DIR/Liste_treil_${NAME}.lst
120 123
121 LM_ACCESS="-lm $ML" 124 LM_ACCESS="-lm $ML"
122 if [ ! -z $SERVER ]; then 125 if [ ! -z $SERVER ]; then
123 LM_ACCESS="-use-server $SERVER -cache-served-ngrams" 126 LM_ACCESS="-use-server $SERVER -cache-served-ngrams"
124 fi 127 fi
125 echo "$SRILM_BIN/lattice-tool -read-htk -in-lattice-list $CONF_DIR/Liste_treil_${NAME}.lst $LM_ACCESS -order $ORDER -htk-logbase 10 -htk-lmscale $FUDGE -htk-wdpenalty $PENALITE -write-htk -out-lattice-dir $HTK_LM"; 128 echo "$SRILM_BIN/lattice-tool -read-htk -in-lattice-list $CONF_DIR/Liste_treil_${NAME}.lst $LM_ACCESS -order $ORDER -htk-logbase 10 -htk-lmscale $FUDGE -htk-wdpenalty $PENALITE -write-htk -out-lattice-dir $HTK_LM";
126 $SRILM_BIN/lattice-tool -read-htk -in-lattice-list $CONF_DIR/Liste_treil_${NAME}.lst $LM_ACCESS -order $ORDER -htk-logbase 10 -htk-lmscale $FUDGE -htk-wdpenalty $PENALITE -write-htk -out-lattice-dir $HTK_LM 129 $SRILM_BIN/lattice-tool -read-htk -in-lattice-list $CONF_DIR/Liste_treil_${NAME}.lst $LM_ACCESS -order $ORDER -htk-logbase 10 -htk-lmscale $FUDGE -htk-wdpenalty $PENALITE -write-htk -out-lattice-dir $HTK_LM
127 130
128 # 131 #
129 # --> Calcul des posteriors a partir des scores acoustiques et linguistiques present dans le HTK 132 # --> Calcul des posteriors a partir des scores acoustiques et linguistiques present dans le HTK
130 # 133 #
131 for file in `ls $HTK_LM/*.treil` 134 for file in `ls $HTK_LM/*.treil`
132 do 135 do
133 base=`basename $file .treil`; 136 base=`basename $file .treil`;
134 #echo "lattice-tool -read-htk -in-lattice $file -compute-posteriors -write-htk -out-lattice $HTK_POST/${base}.htk" 137 #echo "lattice-tool -read-htk -in-lattice $file -compute-posteriors -write-htk -out-lattice $HTK_POST/${base}.htk"
135 $SRILM_BIN/lattice-tool -read-htk -in-lattice $file -compute-posteriors -write-htk -out-lattice $HTK_POST/${base}.htk 138 $SRILM_BIN/lattice-tool -read-htk -in-lattice $file -compute-posteriors -write-htk -out-lattice $HTK_POST/${base}.htk
136 done 139 done
137 fi 140 fi
138 141
139 #--------------------------------------------------------------------------------------------------------------- 142 #---------------------------------------------------------------------------------------------------------------
140 # STEP 2 - alignement res et wlat pour creer res avec scores + infos (utilise un fastnc modifie) 143 # STEP 2 - alignement res et wlat pour creer res avec scores + infos (utilise un fastnc modifie)
141 # Exemple : 144 # Exemple :
142 # ok amendement 0.814885 ( time=36 nodes=3 min=0.0016862 max=0.814885 mean=0.333896 var=0.363849 svar=0.603199 ) 145 # ok amendement 0.814885 ( time=36 nodes=3 min=0.0016862 max=0.814885 mean=0.333896 var=0.363849 svar=0.603199 )
143 #---------------------------------------------------------------------------------------------------------------- 146 #----------------------------------------------------------------------------------------------------------------
144 if [ $FASTNC == 1 ] 147 if [ $FASTNC == 1 ]
145 then 148 then
146 echo "FASTNC step..." 149 echo "FASTNC step..."
147 rm -f $POS/* $WLAT/* > /dev/null 2>&1 150 rm -f $POS/* $WLAT/* > /dev/null 2>&1
148 for file in `ls $HTK_LM/*.treil` 151 for file in `ls $HTK_LM/*.treil`
149 do 152 do
150 base=`basename $file .treil`; 153 base=`basename $file .treil`;
151 #echo "$ROOT/bin/fastnc_v1.4 $HTK_POST/${base}.htk $WLAT/${base}.wlat $1/$FICHIER_RES/${base}.res rien -dtw2 > $POS/$base.pos2&" 154 #echo "$ROOT/bin/fastnc_v1.4 $HTK_POST/${base}.htk $WLAT/${base}.wlat $1/$FICHIER_RES/${base}.res rien -dtw2 > $POS/$base.pos2&"
152 $ROOT/bin/fastnc_v1.4 $HTK_POST/${base}.htk $WLAT/${base}.wlat $1/$FICHIER_RES/${base}.res rien -dtw2 > $POS/$base.pos2 155 $ROOT/bin/fastnc_v1.4 $HTK_POST/${base}.htk $WLAT/${base}.wlat $1/$FICHIER_RES/${base}.res rien -dtw2 > $POS/$base.pos2
153 done 156 done
154 fi 157 fi
155 158
156 #------------------------------------------------------------------------------------------------------------ 159 #------------------------------------------------------------------------------------------------------------
157 # STEP 3 - recuperation de la probabilite pour chaque mot + info relatives au modele de langue (backoff, ...) 160 # STEP 3 - recuperation de la probabilite pour chaque mot + info relatives au modele de langue (backoff, ...)
158 #------------------------------------------------------------------------------------------------------------ 161 #------------------------------------------------------------------------------------------------------------
159 if [ $PPL == 1 ] 162 if [ $PPL == 1 ]
160 then 163 then
161 echo "PPL step..." 164 echo "PPL step..."
162 rm -f $REF/* $CONF_DIR/${NAME}_ALLREF.* $MLCLASS/* > /dev/null 2>&1 165 rm -f $REF/* $CONF_DIR/${NAME}_ALLREF.* $MLCLASS/* > /dev/null 2>&1
163 # 166 #
164 # --> Creation des references a partir des .res (uniquement si .treil present) 167 # --> Creation des references a partir des .res (uniquement si .treil present)
165 # 168 #
166 for file in `ls $1/$FICHIER_RES/*.res` 169 for file in `ls $1/$FICHIER_RES/*.res`
167 do 170 do
168 base=`basename $file .res`; 171 base=`basename $file .res`;
169 if [ -f $1/$FICHIER_RES/$base.treil ];then 172 if [ -f $1/$FICHIER_RES/$base.treil ];then
170 cat $file | cut -f5 -d' ' | tr "\n" " " > $REF/${base}.ref 173 cat $file | cut -f5 -d' ' | tr "\n" " " > $REF/${base}.ref
171 fi 174 fi
172 done 175 done
173 176
174 # 177 #
175 # --> creation d'un fichier contenant l'ensemble des transcriptions du show 178 # --> creation d'un fichier contenant l'ensemble des transcriptions du show
176 # 179 #
177 compteur=0 180 compteur=0
178 for file in `du -sh $REF/*.ref | grep -v "^0" | cut -f2` 181 for file in `du -sh $REF/*.ref | grep -v "^0" | cut -f2`
179 do 182 do
180 base=`basename $file .ref`; 183 base=`basename $file .ref`;
181 cat $file >> $CONF_DIR/${NAME}_ALLREF.txt 184 cat $file >> $CONF_DIR/${NAME}_ALLREF.txt
182 echo "" >> $CONF_DIR/${NAME}_ALLREF.txt 185 echo "" >> $CONF_DIR/${NAME}_ALLREF.txt
183 ListeFichiers[$compteur]=$base.mlclass 186 ListeFichiers[$compteur]=$base.mlclass
184 compteur=$(( $compteur + 1 )) 187 compteur=$(( $compteur + 1 ))
185 done 188 done
186 189
187 # 190 #
188 # --> recuperation de la probabilite pour chaque mot provenant des resultats de l'ASR + informations linguistiques (backoff used, ngram,...) 191 # --> recuperation de la probabilite pour chaque mot provenant des resultats de l'ASR + informations linguistiques (backoff used, ngram,...)
189 # 192 #
190 $SRILM_BIN/ngram -lm $ML -order $ORDER -ppl $CONF_DIR/${NAME}_ALLREF.txt -debug 2 > $CONF_DIR/${NAME}_ALLREF.mlclass 193 $SRILM_BIN/ngram -lm $ML -order $ORDER -ppl $CONF_DIR/${NAME}_ALLREF.txt -debug 2 > $CONF_DIR/${NAME}_ALLREF.mlclass
191 194
192 # 195 #
193 # --> creation d'un fichier par fichier .ref 196 # --> creation d'un fichier par fichier .ref
194 # 197 #
195 compteur=0 198 compteur=0
196 cat $CONF_DIR/${NAME}_ALLREF.mlclass | while read line 199 cat $CONF_DIR/${NAME}_ALLREF.mlclass | while read line
197 do 200 do
198 echo $line | grep "^$" > /dev/null 201 echo $line | grep "^$" > /dev/null
199 if [ $? == 0 ];then 202 if [ $? == 0 ];then
200 compteur=$(( $compteur + 1 )) 203 compteur=$(( $compteur + 1 ))
201 else 204 else
202 echo "$line" | grep "p(" > /dev/null 205 echo "$line" | grep "p(" > /dev/null
203 if [ $? == 0 ];then 206 if [ $? == 0 ];then
204 echo "$line" >> $MLCLASS/${ListeFichiers[${compteur}]}; 207 echo "$line" >> $MLCLASS/${ListeFichiers[${compteur}]};
205 fi 208 fi
206 fi 209 fi
207 done 210 done
208 fi 211 fi
209 212
210 #---------------------------------------------------------- 213 #----------------------------------------------------------
211 # STEP 4 - recuperation du score acoustique de chaque mot 214 # STEP 4 - recuperation du score acoustique de chaque mot
212 #---------------------------------------------------------- 215 #----------------------------------------------------------
213 if [ $ACOUST == 1 ] 216 if [ $ACOUST == 1 ]
214 then 217 then
215 echo "ACOUST step..." 218 echo "ACOUST step..."
216 rm -f $GVALIGN/* > /dev/null 2>&1 219 rm -f $GVALIGN/* > /dev/null 2>&1
217 rm -f $GVCTM/* > /dev/null 2>&1 220 rm -f $GVCTM/* > /dev/null 2>&1
218 rm -f $SEGCTM/* > /dev/null 2>&1 221 rm -f $SEGCTM/* > /dev/null 2>&1
219 rm -f $LIKELIHOOD/* > /dev/null 2>&1 222 rm -f $LIKELIHOOD/* > /dev/null 2>&1
220 223
221 for file in `ls $1/$FICHIER_RES/*.res` 224 for file in `ls $1/$FICHIER_RES/*.res`
222 do 225 do
223 base=`basename $file .res` 226 base=`basename $file .res`
224 if [ -f $1/$FICHIER_RES/$base.treil ];then 227 if [ -f $1/$FICHIER_RES/$base.treil ];then
225 #echo "$ROOT/script/MakeListForGVAlign.pl $file $GVALIGN"; 228 #echo "$ROOT/script/MakeListForGVAlign.pl $file $GVALIGN";
226 $ROOT/script/MakeListForGVAlign.pl $file $GVALIGN 229 $ROOT/script/MakeListForGVAlign.pl $file $GVALIGN
227 fi 230 fi
228 done 231 done
229 232
230 for file in `ls $GVALIGN/*.gvalign` 233 for file in `ls $GVALIGN/*.gvalign`
231 do 234 do
232 base=`basename $file .gvalign`; 235 base=`basename $file .gvalign`;
233 236
234 type=`echo $base | cut -f2 -d: | cut -f2- -d\# | sed -e "s/[0-9]\+//"` 237 type=`echo $base | cut -f2 -d: | cut -f2- -d\# | sed -e "s/[0-9]\+//"`
235 238
236 case "$type" in 239 case "$type" in
237 "M#S") 240 "M#S")
238 HMM=$mod_ms 241 HMM=$mod_ms
239 ;; 242 ;;
240 "F#S") 243 "F#S")
241 HMM=$mod_fs 244 HMM=$mod_fs
242 ;; 245 ;;
243 "M#T") 246 "M#T")
244 HMM=$mod_mt 247 HMM=$mod_mt
245 ;; 248 ;;
246 "F#T") 249 "F#T")
247 HMM=$mod_ft 250 HMM=$mod_ft
248 ;; 251 ;;
249 esac 252 esac
250 253
251 #echo "$ROOT/bin/gvalign.old $HMM $PHON $file -e $1/${REP_PLP}/ -f .plp -r $GVALIGN -g .gv -C FAST -W $GVCTM -O CTM -s $SEGCTM > $LIKELIHOOD/${base}.likelihood | sed -e 's/Decoding/\\nDecoding/g' > $LIKELIHOOD/${base}.likelihood"; 254 #echo "$ROOT/bin/gvalign.old $HMM $PHON $file -e $1/${REP_PLP}/ -f .plp -r $GVALIGN -g .gv -C FAST -W $GVCTM -O CTM -s $SEGCTM > $LIKELIHOOD/${base}.likelihood | sed -e 's/Decoding/\\nDecoding/g' > $LIKELIHOOD/${base}.likelihood";
252 #$ROOT/bin/gvalign.old $HMM $PHON $file -e $1/${REP_PLP}/ -f .plp -r $GVALIGN -g .gv -C FAST -W $GVCTM -O CTM -s $SEGCTM | sed -e 's/Decoding/\nDecoding/g' > $LIKELIHOOD/${base}.likelihood 255 #$ROOT/bin/gvalign.old $HMM $PHON $file -e $1/${REP_PLP}/ -f .plp -r $GVALIGN -g .gv -C FAST -W $GVCTM -O CTM -s $SEGCTM | sed -e 's/Decoding/\nDecoding/g' > $LIKELIHOOD/${base}.likelihood
253 touch $LIKELIHOOD/${base}.likelihood 256 touch $LIKELIHOOD/${base}.likelihood
254 done 257 done
255 fi 258 fi
256 259
257 #-------------------------------------------------------------------------------------------------------------------------------- 260 #--------------------------------------------------------------------------------------------------------------------------------
258 # STEP 5 - Merge de tous les scores caclules => res (ctm) avec scores/params utiliser dans la classif 261 # STEP 5 - Merge de tous les scores caclules => res (ctm) avec scores/params utiliser dans la classif
259 # Format : 262 # Format :
260 # mot NbNode MinNode MaxNode MeanNode VarNode SVarNode Posterior AcousticLogLikelihood AcousticLogLikelihood/Frame ... 263 # mot NbNode MinNode MaxNode MeanNode VarNode SVarNode Posterior AcousticLogLikelihood AcousticLogLikelihood/Frame ...
261 # AcousticConfidenceLikelihood AcousticConstraintLikeLihood AcousticNoConstraint Likelihood ClasseRepliLinguistique ... 264 # AcousticConfidenceLikelihood AcousticConstraintLikeLihood AcousticNoConstraint Likelihood ClasseRepliLinguistique ...
262 # RepliLinguistique LogLinguistique LogUnigramme NbMotsFenetre NbNulNode NbTrame 265 # RepliLinguistique LogLinguistique LogUnigramme NbMotsFenetre NbNulNode NbTrame
263 #--------------------------------------------------------------------------------------------------------------------------------- 266 #---------------------------------------------------------------------------------------------------------------------------------
264 if [ $EXTRACT == 1 ] 267 if [ $EXTRACT == 1 ]
265 then 268 then
266 echo "EXTRACT step..." 269 echo "EXTRACT step..."
267 rm -f $SUPER_CTM/* > /dev/null 2>&1 270 rm -f $SUPER_CTM/* > /dev/null 2>&1
268 271
269 for file in `ls $1/$FICHIER_RES/*.res` 272 for file in `ls $1/$FICHIER_RES/*.res`
270 do 273 do
271 base=`basename $file .res`; 274 base=`basename $file .res`;
272 like=`echo "$base" | sed -e 's/\..*//'`; 275 like=`echo "$base" | sed -e 's/\..*//'`;
273 if [ -f $1/$FICHIER_RES/$base.treil ]; then 276 if [ -f $1/$FICHIER_RES/$base.treil ]; then
274 echo "$ROOT/scripts/ExtractData.pl $pathML $nameML $POS/${base}.pos2 $file $LIKELIHOOD/${like}.likelihood $MLCLASS/${base}.mlclass $TYPE_ML > $SUPER_CTM/${base}.ctm"; 277 echo "$ROOT/scripts/ExtractData.pl $pathML $nameML $POS/${base}.pos2 $file $LIKELIHOOD/${like}.likelihood $MLCLASS/${base}.mlclass $TYPE_ML > $SUPER_CTM/${base}.ctm";
275 $ROOT/script/ExtractData.pl $pathML $nameML $POS/${base}.pos2 $file $LIKELIHOOD/${like}.likelihood $MLCLASS/${base}.mlclass $TYPE_ML > $SUPER_CTM/${base}.ctm 278 $ROOT/script/ExtractData.pl $pathML $nameML $POS/${base}.pos2 $file $LIKELIHOOD/${like}.likelihood $MLCLASS/${base}.mlclass $TYPE_ML > $SUPER_CTM/${base}.ctm
276 # $ROOT/script/ExtractData.pl $pathML $nameML $POS/${base}.pos2 $file $LIKELIHOOD/${like}.likelihood $MLCLASS/${base}.mlclass $TYPE_ML > $SUPER_CTM/${base}.ctm 279 # $ROOT/script/ExtractData.pl $pathML $nameML $POS/${base}.pos2 $file $LIKELIHOOD/${like}.likelihood $MLCLASS/${base}.mlclass $TYPE_ML > $SUPER_CTM/${base}.ctm
277 fi 280 fi
278 done 281 done
279 fi 282 fi
280 283
281 #---------------------------------------------------------------- 284 #----------------------------------------------------------------
282 # STEP 6 - Calcul effectif du score de confiance pour chaque mot 285 # STEP 6 - Calcul effectif du score de confiance pour chaque mot
283 #---------------------------------------------------------------- 286 #----------------------------------------------------------------
284 if [ $BOOST == 1 ] 287 if [ $BOOST == 1 ]
285 then 288 then
286 echo "BOOST step..." 289 echo "BOOST step..."
287 rm -f $SCORED_CTM/* $CONF_DIR/${NAME}.sctm $CONF_DIR/${NAME}.boost* $CONF_DIR/${NAME}.resboost* $CONF_DIR/${NAME}.corres* > /dev/null 2>&1 290 rm -f $SCORED_CTM/* $CONF_DIR/${NAME}.sctm $CONF_DIR/${NAME}.boost* $CONF_DIR/${NAME}.resboost* $CONF_DIR/${NAME}.corres* > /dev/null 2>&1
288 # utilise pour le test sans etiquette 291 # utilise pour le test sans etiquette
289 $ROOT/script/DissociateErroneousFromDecoded.pl $SUPER_CTM 2 equilibre > $CONF_DIR/${NAME}.sctm 292 $ROOT/script/DissociateErroneousFromDecoded.pl $SUPER_CTM 2 equilibre > $CONF_DIR/${NAME}.sctm
290 293
291 $ROOT/script/ConvertSuperCTMtoDataSVM.pl $CONF_DIR/${NAME}.sctm boost 2 0 0 > $CONF_DIR/${NAME}.boost 294 $ROOT/script/ConvertSuperCTMtoDataSVM.pl $CONF_DIR/${NAME}.sctm boost 2 0 0 > $CONF_DIR/${NAME}.boost
292 $ROOT/script/ConvertSuperCTMtoDataSVM.pl $CONF_DIR/${NAME}.sctm boost 2 0 1 > $CONF_DIR/${NAME}.boost_refs 295 $ROOT/script/ConvertSuperCTMtoDataSVM.pl $CONF_DIR/${NAME}.sctm boost 2 0 1 > $CONF_DIR/${NAME}.boost_refs
293 296
294 $BOOST_BIN -S $ROOT/TRAIN -C --posteriors < $CONF_DIR/${NAME}.boost > $CONF_DIR/${NAME}.resboost 297 $BOOST_BIN -S $ROOT/TRAIN -C --posteriors < $CONF_DIR/${NAME}.boost > $CONF_DIR/${NAME}.resboost
295 298
296 cat $CONF_DIR/${NAME}.resboost | cut -f4 -d" " > $CONF_DIR/${NAME}.resboost2 299 cat $CONF_DIR/${NAME}.resboost | cut -f4 -d" " > $CONF_DIR/${NAME}.resboost2
297 300
298 cat $CONF_DIR/${NAME}.boost_refs | sed -e 's/.*ref=//' > $CONF_DIR/${NAME}.corres 301 cat $CONF_DIR/${NAME}.boost_refs | sed -e 's/.*ref=//' > $CONF_DIR/${NAME}.corres
299 302
300 paste $CONF_DIR/${NAME}.corres $CONF_DIR/${NAME}.resboost2 | sed -e 's/\.ctm/\.res/' > $CONF_DIR/${NAME}.corres2 303 paste $CONF_DIR/${NAME}.corres $CONF_DIR/${NAME}.resboost2 | sed -e 's/\.ctm/\.res/' > $CONF_DIR/${NAME}.corres2
301 304
302 $ROOT/script/AssociateScoreToCtm.pl $CONF_DIR/${NAME}.corres2 $1/$FICHIER_RES/ $SCORED_CTM/ 305 $ROOT/script/AssociateScoreToCtm.pl $CONF_DIR/${NAME}.corres2 $1/$FICHIER_RES/ $SCORED_CTM/
303 fi 306 fi
304 echo "END" 307 echo "END"
305 308