Commit 2dca315085d911c2612883625e1e06ec589fc623
1 parent
d2e52398d0
Exists in
master
Move README in README.md
Showing 2 changed files with 106 additions and 106 deletions Side-by-side Diff
README
1 | -# DEFT 2017 - Sentiment Analysis | |
2 | - | |
3 | -- Authors: Mickael Rouvier and Pierre-Michel Bousquet | |
4 | -- Version: 1.0 | |
5 | -- Date: 26/06/17 | |
6 | - | |
7 | -These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants. | |
8 | - | |
9 | - | |
10 | -You can reproduce my results or freely adapt my code for your experiments. | |
11 | - | |
12 | - | |
13 | -Warning, before to run the system execute the makefile: | |
14 | - | |
15 | -```shell | |
16 | -make | |
17 | -``` | |
18 | - | |
19 | -This executable split the training corpus (K-Fold) and tokenize the tweets: | |
20 | - | |
21 | -```shell | |
22 | -sh run_corpus.sh | |
23 | -``` | |
24 | - | |
25 | -This executable train the different word embeddings: | |
26 | -```shell | |
27 | -sh run_word2vec.sh | |
28 | -``` | |
29 | - | |
30 | -This executable learn the models: | |
31 | - | |
32 | -```shell | |
33 | -sh run_cnn.sh | |
34 | -``` | |
35 | - | |
36 | -This executable run the model on dev and test: | |
37 | - | |
38 | -```shell | |
39 | -sh run_extract_dev.sh | |
40 | -sh run_extract_test.sh | |
41 | -``` | |
42 | - | |
43 | -At this point you can score the CNNs: | |
44 | -```shell | |
45 | -ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt | |
46 | -ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt | |
47 | -ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt | |
48 | -``` | |
49 | - | |
50 | - | |
51 | -This executable run the fusion system: | |
52 | -```shell | |
53 | -sh run_fusion.sh | |
54 | -``` | |
55 | - | |
56 | - | |
57 | -Finally, you can score the full-system: | |
58 | -```shell | |
59 | -ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv | |
60 | -ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv | |
61 | -ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv | |
62 | -``` | |
63 | - | |
64 | - | |
65 | - | |
66 | -# Results | |
67 | - | |
68 | - | |
69 | -## Baseline | |
70 | - | |
71 | -We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN): | |
72 | - | |
73 | - | |
74 | -| Corpus | Baseline | | |
75 | -| ------------ |:-------------:| | |
76 | -| Task1 | 59.55 | | |
77 | -| Task2 | 77.18 | | |
78 | -| Task3 | 57.59 | | |
79 | - | |
80 | - | |
81 | - | |
82 | - | |
83 | -## DEFT 2017 | |
84 | - | |
85 | -These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis: | |
86 | - | |
87 | -| Corpus | Task1 | Task2 | Task3 | | |
88 | -| ----------- |:-------------:|:-------------:|:-------------:| | |
89 | -| Run1 | 60.23 | 78.31 | 57.83 | | |
90 | -| Run2 | 63.44 | 77.39 | 58.49 | | |
91 | -| Run3 | 65.00 | 77.43 | 59.39 | | |
92 | - | |
93 | - | |
94 | -# Citing | |
95 | - | |
96 | -The system is described in this paper: | |
97 | - | |
98 | - @inproceedings{rouvier2017, | |
99 | - author = {Mickael Rouvier and Pierre-Michel Bousquet}, | |
100 | - title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network}, | |
101 | - booktitle = {DEFT 2107}, | |
102 | - year = {2017}, | |
103 | - address = {Orleans, France} | |
104 | - } |
README.md
1 | +# DEFT 2017 - Sentiment Analysis | |
2 | + | |
3 | +- Authors: Mickael Rouvier and Pierre-Michel Bousquet | |
4 | +- Version: 1.0 | |
5 | +- Date: 26/06/17 | |
6 | + | |
7 | +These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants. | |
8 | + | |
9 | + | |
10 | +You can reproduce my results or freely adapt my code for your experiments. | |
11 | + | |
12 | + | |
13 | +Warning, before to run the system execute the makefile: | |
14 | + | |
15 | +```shell | |
16 | +make | |
17 | +``` | |
18 | + | |
19 | +This executable split the training corpus (K-Fold) and tokenize the tweets: | |
20 | + | |
21 | +```shell | |
22 | +sh run_corpus.sh | |
23 | +``` | |
24 | + | |
25 | +This executable train the different word embeddings: | |
26 | +```shell | |
27 | +sh run_word2vec.sh | |
28 | +``` | |
29 | + | |
30 | +This executable learn the models: | |
31 | + | |
32 | +```shell | |
33 | +sh run_cnn.sh | |
34 | +``` | |
35 | + | |
36 | +This executable run the model on dev and test: | |
37 | + | |
38 | +```shell | |
39 | +sh run_extract_dev.sh | |
40 | +sh run_extract_test.sh | |
41 | +``` | |
42 | + | |
43 | +At this point you can score the CNNs: | |
44 | +```shell | |
45 | +ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt | |
46 | +ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt | |
47 | +ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt | |
48 | +``` | |
49 | + | |
50 | + | |
51 | +This executable run the fusion system: | |
52 | +```shell | |
53 | +sh run_fusion.sh | |
54 | +``` | |
55 | + | |
56 | + | |
57 | +Finally, you can score the full-system: | |
58 | +```shell | |
59 | +ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv | |
60 | +ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv | |
61 | +ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv | |
62 | +``` | |
63 | + | |
64 | + | |
65 | + | |
66 | +# Results | |
67 | + | |
68 | + | |
69 | +## Baseline | |
70 | + | |
71 | +We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN): | |
72 | + | |
73 | + | |
74 | +| Corpus | Baseline | | |
75 | +| ------------ |:-------------:| | |
76 | +| Task1 | 59.55 | | |
77 | +| Task2 | 77.18 | | |
78 | +| Task3 | 57.59 | | |
79 | + | |
80 | + | |
81 | + | |
82 | + | |
83 | +## DEFT 2017 | |
84 | + | |
85 | +These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis: | |
86 | + | |
87 | +| Corpus | Task1 | Task2 | Task3 | | |
88 | +| ----------- |:-------------:|:-------------:|:-------------:| | |
89 | +| Run1 | 60.23 | 78.31 | 57.83 | | |
90 | +| Run2 | 63.44 | 77.39 | 58.49 | | |
91 | +| Run3 | 65.00 | 77.43 | 59.39 | | |
92 | + | |
93 | + | |
94 | +# Citing | |
95 | + | |
96 | +The system is described in this paper: | |
97 | + | |
98 | + @inproceedings{rouvier2017, | |
99 | + author = {Mickael Rouvier and Pierre-Michel Bousquet}, | |
100 | + title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network}, | |
101 | + booktitle = {DEFT 2107}, | |
102 | + year = {2017}, | |
103 | + address = {Orleans, France} | |
104 | + } |