Commit d2e52398d07e0a6c75f549038a3eb23ea6c5004f
1 parent
f5fbe398d6
Exists in
master
Full-system
Showing 1 changed file with 106 additions and 0 deletions Inline Diff
README
1 | # DEFT 2017 - Sentiment Analysis | ||
2 | |||
3 | - Authors: Mickael Rouvier and Pierre-Michel Bousquet | ||
4 | - Version: 1.0 | ||
5 | - Date: 26/06/17 | ||
6 | |||
7 | These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants. | ||
8 | |||
9 | |||
10 | You can reproduce my results or freely adapt my code for your experiments. | ||
11 | |||
12 | |||
13 | Warning, before to run the system execute the makefile: | ||
14 | |||
15 | ```shell | ||
16 | make | ||
17 | ``` | ||
18 | |||
19 | This executable split the training corpus (K-Fold) and tokenize the tweets: | ||
20 | |||
21 | ```shell | ||
22 | sh run_corpus.sh | ||
23 | ``` | ||
24 | |||
25 | This executable train the different word embeddings: | ||
26 | ```shell | ||
27 | sh run_word2vec.sh | ||
28 | ``` | ||
29 | |||
30 | This executable learn the models: | ||
31 | |||
32 | ```shell | ||
33 | sh run_cnn.sh | ||
34 | ``` | ||
35 | |||
36 | This executable run the model on dev and test: | ||
37 | |||
38 | ```shell | ||
39 | sh run_extract_dev.sh | ||
40 | sh run_extract_test.sh | ||
41 | ``` | ||
42 | |||
43 | At this point you can score the CNNs: | ||
44 | ```shell | ||
45 | ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt | ||
46 | ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt | ||
47 | ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt | ||
48 | ``` | ||
49 | |||
50 | |||
51 | This executable run the fusion system: | ||
52 | ```shell | ||
53 | sh run_fusion.sh | ||
54 | ``` | ||
55 | |||
56 | |||
57 | Finally, you can score the full-system: | ||
58 | ```shell | ||
59 | ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv | ||
60 | ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv | ||
61 | ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv | ||
62 | ``` | ||
63 | |||
64 | |||
65 | |||
66 | # Results | ||
67 | |||
68 | |||
69 | ## Baseline | ||
70 | |||
71 | We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN): | ||
72 | |||
73 | |||
74 | | Corpus | Baseline | | ||
75 | | ------------ |:-------------:| | ||
76 | | Task1 | 59.55 | | ||
77 | | Task2 | 77.18 | | ||
78 | | Task3 | 57.59 | | ||
79 | |||
80 | |||
81 | |||
82 | |||
83 | ## DEFT 2017 | ||
84 | |||
85 | These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis: | ||
86 | |||
87 | | Corpus | Task1 | Task2 | Task3 | | ||
88 | | ----------- |:-------------:|:-------------:|:-------------:| | ||
89 | | Run1 | 60.23 | 78.31 | 57.83 | | ||
90 | | Run2 | 63.44 | 77.39 | 58.49 | | ||
91 | | Run3 | 65.00 | 77.43 | 59.39 | | ||
92 | |||
93 | |||
94 | # Citing | ||
95 | |||
96 | The system is described in this paper: | ||
97 | |||
98 | @inproceedings{rouvier2017, | ||
99 | author = {Mickael Rouvier and Pierre-Michel Bousquet}, | ||
100 | title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network}, | ||
101 | booktitle = {DEFT 2107}, | ||
102 | year = {2017}, | ||
103 | address = {Orleans, France} | ||
104 | } | ||
105 | |||
106 | |||
107 |