Commit d2e52398d07e0a6c75f549038a3eb23ea6c5004f

Authored by Rouvier Mickael
1 parent f5fbe398d6
Exists in master

Full-system

Showing 1 changed file with 106 additions and 0 deletions Inline Diff

1 # DEFT 2017 - Sentiment Analysis
2
3 - Authors: Mickael Rouvier and Pierre-Michel Bousquet
4 - Version: 1.0
5 - Date: 26/06/17
6
7 These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
8
9
10 You can reproduce my results or freely adapt my code for your experiments.
11
12
13 Warning, before to run the system execute the makefile:
14
15 ```shell
16 make
17 ```
18
19 This executable split the training corpus (K-Fold) and tokenize the tweets:
20
21 ```shell
22 sh run_corpus.sh
23 ```
24
25 This executable train the different word embeddings:
26 ```shell
27 sh run_word2vec.sh
28 ```
29
30 This executable learn the models:
31
32 ```shell
33 sh run_cnn.sh
34 ```
35
36 This executable run the model on dev and test:
37
38 ```shell
39 sh run_extract_dev.sh
40 sh run_extract_test.sh
41 ```
42
43 At this point you can score the CNNs:
44 ```shell
45 ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
46 ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
47 ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
48 ```
49
50
51 This executable run the fusion system:
52 ```shell
53 sh run_fusion.sh
54 ```
55
56
57 Finally, you can score the full-system:
58 ```shell
59 ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
60 ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
61 ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
62 ```
63
64
65
66 # Results
67
68
69 ## Baseline
70
71 We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
72
73
74 | Corpus | Baseline |
75 | ------------ |:-------------:|
76 | Task1 | 59.55 |
77 | Task2 | 77.18 |
78 | Task3 | 57.59 |
79
80
81
82
83 ## DEFT 2017
84
85 These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
86
87 | Corpus | Task1 | Task2 | Task3 |
88 | ----------- |:-------------:|:-------------:|:-------------:|
89 | Run1 | 60.23 | 78.31 | 57.83 |
90 | Run2 | 63.44 | 77.39 | 58.49 |
91 | Run3 | 65.00 | 77.43 | 59.39 |
92
93
94 # Citing
95
96 The system is described in this paper:
97
98 @inproceedings{rouvier2017,
99 author = {Mickael Rouvier and Pierre-Michel Bousquet},
100 title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
101 booktitle = {DEFT 2107},
102 year = {2017},
103 address = {Orleans, France}
104 }
105
106
107