Blame view

README.md 2.77 KB
d2e52398d   Rouvier Mickael   Full-system
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
  # DEFT 2017 - Sentiment Analysis
  
  - Authors: Mickael Rouvier and Pierre-Michel Bousquet
  - Version: 1.0
  - Date: 26/06/17
  
  These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
  
  
  You can reproduce my results or freely adapt my code for your experiments.
  
  
  Warning, before to run the system execute the makefile:
  
  ```shell
  make
  ```
  
  This executable split the training corpus (K-Fold) and tokenize the tweets:
  
  ```shell
  sh run_corpus.sh
  ```
  
  This executable train the different word embeddings:
  ```shell
  sh run_word2vec.sh
  ```
  
  This executable learn the models:
  
  ```shell
  sh run_cnn.sh
  ```
  
  This executable run the model on dev and test:
  
  ```shell
  sh run_extract_dev.sh
  sh run_extract_test.sh
  ```
362b552ee   Rouvier Mickael   upload system
42
  At this point you can evaluate the CNNs:
d2e52398d   Rouvier Mickael   Full-system
43
44
45
46
47
48
49
50
51
52
53
  ```shell
  ruby bin/scoring.rb data/task1_test.tokenize  results_test/cnn_task1_0_distant_size100_123.txt
  ruby bin/scoring.rb data/task2_test.tokenize  results_test/cnn_task2_0_distant_size100_123.txt
  ruby bin/scoring.rb data/task3_test.tokenize  results_test/cnn_task3_0_distant_size100_123.txt
  ```
  
  
  This executable run the fusion system:
  ```shell
  sh run_fusion.sh
  ```
362b552ee   Rouvier Mickael   upload system
54
  Finally, you can evaluate the full-system:
d2e52398d   Rouvier Mickael   Full-system
55
56
57
58
59
60
61
62
63
64
65
66
  ```shell
  ruby bin/scoring.rb data/task1_test.tokenize  output/equipe-8_tache1_run3.csv
  ruby bin/scoring.rb data/task2_test.tokenize  output/equipe-8_tache2_run1.csv
  ruby bin/scoring.rb data/task3_test.tokenize  output/equipe-8_tache3_run3.csv
  ```
  
  
  
  # Results
  
  
  ## Baseline
362b552ee   Rouvier Mickael   upload system
67
  We reproduce the sentiment analysis system of Kim (based on word2vec and CNN):
d2e52398d   Rouvier Mickael   Full-system
68
69
70
71
72
73
74
75
76
77
78
79
  
  
  | Corpus       | Baseline      | 
  | ------------ |:-------------:|
  | Task1        |  59.55        |
  | Task2        |  77.18        |
  | Task3        |  57.59        |
  
  
  
  
  ## DEFT 2017
362b552ee   Rouvier Mickael   upload system
80
  These results are those LIA system presented in SemEval 2016 Sentiment Analysis:
d2e52398d   Rouvier Mickael   Full-system
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
  
  | Corpus      | Task1         | Task2         | Task3         |
  | ----------- |:-------------:|:-------------:|:-------------:|
  | Run1        |  60.23        |  78.31        |  57.83        |
  | Run2        |  63.44        |  77.39        |  58.49        |
  | Run3        |  65.00        |  77.43        |  59.39        |
  
  
  # Citing
  
  The system is described in this paper:
  
      @inproceedings{rouvier2017,
        author    = {Mickael Rouvier and Pierre-Michel Bousquet},
        title     = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
        booktitle = {DEFT 2107},
        year      = {2017},
        address   = {Orleans, France}
      }