Commit 2dca315085d911c2612883625e1e06ec589fc623

Authored by Rouvier Mickael
1 parent d2e52398d0
Exists in master

Move README in README.md

Showing 2 changed files with 106 additions and 106 deletions Side-by-side Diff

1   -# DEFT 2017 - Sentiment Analysis
2   -
3   -- Authors: Mickael Rouvier and Pierre-Michel Bousquet
4   -- Version: 1.0
5   -- Date: 26/06/17
6   -
7   -These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
8   -
9   -
10   -You can reproduce my results or freely adapt my code for your experiments.
11   -
12   -
13   -Warning, before to run the system execute the makefile:
14   -
15   -```shell
16   -make
17   -```
18   -
19   -This executable split the training corpus (K-Fold) and tokenize the tweets:
20   -
21   -```shell
22   -sh run_corpus.sh
23   -```
24   -
25   -This executable train the different word embeddings:
26   -```shell
27   -sh run_word2vec.sh
28   -```
29   -
30   -This executable learn the models:
31   -
32   -```shell
33   -sh run_cnn.sh
34   -```
35   -
36   -This executable run the model on dev and test:
37   -
38   -```shell
39   -sh run_extract_dev.sh
40   -sh run_extract_test.sh
41   -```
42   -
43   -At this point you can score the CNNs:
44   -```shell
45   -ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
46   -ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
47   -ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
48   -```
49   -
50   -
51   -This executable run the fusion system:
52   -```shell
53   -sh run_fusion.sh
54   -```
55   -
56   -
57   -Finally, you can score the full-system:
58   -```shell
59   -ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
60   -ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
61   -ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
62   -```
63   -
64   -
65   -
66   -# Results
67   -
68   -
69   -## Baseline
70   -
71   -We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
72   -
73   -
74   -| Corpus | Baseline |
75   -| ------------ |:-------------:|
76   -| Task1 | 59.55 |
77   -| Task2 | 77.18 |
78   -| Task3 | 57.59 |
79   -
80   -
81   -
82   -
83   -## DEFT 2017
84   -
85   -These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
86   -
87   -| Corpus | Task1 | Task2 | Task3 |
88   -| ----------- |:-------------:|:-------------:|:-------------:|
89   -| Run1 | 60.23 | 78.31 | 57.83 |
90   -| Run2 | 63.44 | 77.39 | 58.49 |
91   -| Run3 | 65.00 | 77.43 | 59.39 |
92   -
93   -
94   -# Citing
95   -
96   -The system is described in this paper:
97   -
98   - @inproceedings{rouvier2017,
99   - author = {Mickael Rouvier and Pierre-Michel Bousquet},
100   - title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
101   - booktitle = {DEFT 2107},
102   - year = {2017},
103   - address = {Orleans, France}
104   - }
  1 +# DEFT 2017 - Sentiment Analysis
  2 +
  3 +- Authors: Mickael Rouvier and Pierre-Michel Bousquet
  4 +- Version: 1.0
  5 +- Date: 26/06/17
  6 +
  7 +These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
  8 +
  9 +
  10 +You can reproduce my results or freely adapt my code for your experiments.
  11 +
  12 +
  13 +Warning, before to run the system execute the makefile:
  14 +
  15 +```shell
  16 +make
  17 +```
  18 +
  19 +This executable split the training corpus (K-Fold) and tokenize the tweets:
  20 +
  21 +```shell
  22 +sh run_corpus.sh
  23 +```
  24 +
  25 +This executable train the different word embeddings:
  26 +```shell
  27 +sh run_word2vec.sh
  28 +```
  29 +
  30 +This executable learn the models:
  31 +
  32 +```shell
  33 +sh run_cnn.sh
  34 +```
  35 +
  36 +This executable run the model on dev and test:
  37 +
  38 +```shell
  39 +sh run_extract_dev.sh
  40 +sh run_extract_test.sh
  41 +```
  42 +
  43 +At this point you can score the CNNs:
  44 +```shell
  45 +ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
  46 +ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
  47 +ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
  48 +```
  49 +
  50 +
  51 +This executable run the fusion system:
  52 +```shell
  53 +sh run_fusion.sh
  54 +```
  55 +
  56 +
  57 +Finally, you can score the full-system:
  58 +```shell
  59 +ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
  60 +ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
  61 +ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
  62 +```
  63 +
  64 +
  65 +
  66 +# Results
  67 +
  68 +
  69 +## Baseline
  70 +
  71 +We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
  72 +
  73 +
  74 +| Corpus | Baseline |
  75 +| ------------ |:-------------:|
  76 +| Task1 | 59.55 |
  77 +| Task2 | 77.18 |
  78 +| Task3 | 57.59 |
  79 +
  80 +
  81 +
  82 +
  83 +## DEFT 2017
  84 +
  85 +These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
  86 +
  87 +| Corpus | Task1 | Task2 | Task3 |
  88 +| ----------- |:-------------:|:-------------:|:-------------:|
  89 +| Run1 | 60.23 | 78.31 | 57.83 |
  90 +| Run2 | 63.44 | 77.39 | 58.49 |
  91 +| Run3 | 65.00 | 77.43 | 59.39 |
  92 +
  93 +
  94 +# Citing
  95 +
  96 +The system is described in this paper:
  97 +
  98 + @inproceedings{rouvier2017,
  99 + author = {Mickael Rouvier and Pierre-Michel Bousquet},
  100 + title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
  101 + booktitle = {DEFT 2107},
  102 + year = {2017},
  103 + address = {Orleans, France}
  104 + }