Commit d2e52398d07e0a6c75f549038a3eb23ea6c5004f

Authored by Rouvier Mickael
1 parent f5fbe398d6
Exists in master

Full-system

Showing 1 changed file with 106 additions and 0 deletions Side-by-side Diff

  1 +# DEFT 2017 - Sentiment Analysis
  2 +
  3 +- Authors: Mickael Rouvier and Pierre-Michel Bousquet
  4 +- Version: 1.0
  5 +- Date: 26/06/17
  6 +
  7 +These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
  8 +
  9 +
  10 +You can reproduce my results or freely adapt my code for your experiments.
  11 +
  12 +
  13 +Warning, before to run the system execute the makefile:
  14 +
  15 +```shell
  16 +make
  17 +```
  18 +
  19 +This executable split the training corpus (K-Fold) and tokenize the tweets:
  20 +
  21 +```shell
  22 +sh run_corpus.sh
  23 +```
  24 +
  25 +This executable train the different word embeddings:
  26 +```shell
  27 +sh run_word2vec.sh
  28 +```
  29 +
  30 +This executable learn the models:
  31 +
  32 +```shell
  33 +sh run_cnn.sh
  34 +```
  35 +
  36 +This executable run the model on dev and test:
  37 +
  38 +```shell
  39 +sh run_extract_dev.sh
  40 +sh run_extract_test.sh
  41 +```
  42 +
  43 +At this point you can score the CNNs:
  44 +```shell
  45 +ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
  46 +ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
  47 +ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
  48 +```
  49 +
  50 +
  51 +This executable run the fusion system:
  52 +```shell
  53 +sh run_fusion.sh
  54 +```
  55 +
  56 +
  57 +Finally, you can score the full-system:
  58 +```shell
  59 +ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
  60 +ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
  61 +ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
  62 +```
  63 +
  64 +
  65 +
  66 +# Results
  67 +
  68 +
  69 +## Baseline
  70 +
  71 +We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
  72 +
  73 +
  74 +| Corpus | Baseline |
  75 +| ------------ |:-------------:|
  76 +| Task1 | 59.55 |
  77 +| Task2 | 77.18 |
  78 +| Task3 | 57.59 |
  79 +
  80 +
  81 +
  82 +
  83 +## DEFT 2017
  84 +
  85 +These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
  86 +
  87 +| Corpus | Task1 | Task2 | Task3 |
  88 +| ----------- |:-------------:|:-------------:|:-------------:|
  89 +| Run1 | 60.23 | 78.31 | 57.83 |
  90 +| Run2 | 63.44 | 77.39 | 58.49 |
  91 +| Run3 | 65.00 | 77.43 | 59.39 |
  92 +
  93 +
  94 +# Citing
  95 +
  96 +The system is described in this paper:
  97 +
  98 + @inproceedings{rouvier2017,
  99 + author = {Mickael Rouvier and Pierre-Michel Bousquet},
  100 + title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
  101 + booktitle = {DEFT 2107},
  102 + year = {2017},
  103 + address = {Orleans, France}
  104 + }