Commit d2e52398d07e0a6c75f549038a3eb23ea6c5004f
1 parent
f5fbe398d6
Exists in
master
Full-system
Showing 1 changed file with 106 additions and 0 deletions Side-by-side Diff
README
1 | +# DEFT 2017 - Sentiment Analysis | |
2 | + | |
3 | +- Authors: Mickael Rouvier and Pierre-Michel Bousquet | |
4 | +- Version: 1.0 | |
5 | +- Date: 26/06/17 | |
6 | + | |
7 | +These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants. | |
8 | + | |
9 | + | |
10 | +You can reproduce my results or freely adapt my code for your experiments. | |
11 | + | |
12 | + | |
13 | +Warning, before to run the system execute the makefile: | |
14 | + | |
15 | +```shell | |
16 | +make | |
17 | +``` | |
18 | + | |
19 | +This executable split the training corpus (K-Fold) and tokenize the tweets: | |
20 | + | |
21 | +```shell | |
22 | +sh run_corpus.sh | |
23 | +``` | |
24 | + | |
25 | +This executable train the different word embeddings: | |
26 | +```shell | |
27 | +sh run_word2vec.sh | |
28 | +``` | |
29 | + | |
30 | +This executable learn the models: | |
31 | + | |
32 | +```shell | |
33 | +sh run_cnn.sh | |
34 | +``` | |
35 | + | |
36 | +This executable run the model on dev and test: | |
37 | + | |
38 | +```shell | |
39 | +sh run_extract_dev.sh | |
40 | +sh run_extract_test.sh | |
41 | +``` | |
42 | + | |
43 | +At this point you can score the CNNs: | |
44 | +```shell | |
45 | +ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt | |
46 | +ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt | |
47 | +ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt | |
48 | +``` | |
49 | + | |
50 | + | |
51 | +This executable run the fusion system: | |
52 | +```shell | |
53 | +sh run_fusion.sh | |
54 | +``` | |
55 | + | |
56 | + | |
57 | +Finally, you can score the full-system: | |
58 | +```shell | |
59 | +ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv | |
60 | +ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv | |
61 | +ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv | |
62 | +``` | |
63 | + | |
64 | + | |
65 | + | |
66 | +# Results | |
67 | + | |
68 | + | |
69 | +## Baseline | |
70 | + | |
71 | +We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN): | |
72 | + | |
73 | + | |
74 | +| Corpus | Baseline | | |
75 | +| ------------ |:-------------:| | |
76 | +| Task1 | 59.55 | | |
77 | +| Task2 | 77.18 | | |
78 | +| Task3 | 57.59 | | |
79 | + | |
80 | + | |
81 | + | |
82 | + | |
83 | +## DEFT 2017 | |
84 | + | |
85 | +These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis: | |
86 | + | |
87 | +| Corpus | Task1 | Task2 | Task3 | | |
88 | +| ----------- |:-------------:|:-------------:|:-------------:| | |
89 | +| Run1 | 60.23 | 78.31 | 57.83 | | |
90 | +| Run2 | 63.44 | 77.39 | 58.49 | | |
91 | +| Run3 | 65.00 | 77.43 | 59.39 | | |
92 | + | |
93 | + | |
94 | +# Citing | |
95 | + | |
96 | +The system is described in this paper: | |
97 | + | |
98 | + @inproceedings{rouvier2017, | |
99 | + author = {Mickael Rouvier and Pierre-Michel Bousquet}, | |
100 | + title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network}, | |
101 | + booktitle = {DEFT 2107}, | |
102 | + year = {2017}, | |
103 | + address = {Orleans, France} | |
104 | + } |