Download zip Select Archive Format
Name Last Update history
File dir bin Loading commit data...
File dir data Loading commit data...
File dir db Loading commit data...
File dir db_corrige Loading commit data...
File dir dico Loading commit data...
File dir embedding Loading commit data...
File dir model Loading commit data...
File dir output Loading commit data...
File dir results Loading commit data...
File dir results_test Loading commit data...
File dir twitter Loading commit data...
File txt Makefile Loading commit data...
File txt README.md Loading commit data...
File txt run_cnn.sh Loading commit data...
File txt run_corpus.sh Loading commit data...
File txt run_extract_dev.sh Loading commit data...
File txt run_extract_test.sh Loading commit data...
File txt run_fusion.sh Loading commit data...
File txt run_word2vec.sh Loading commit data...

README.md

DEFT 2017 - Sentiment Analysis

  • Authors: Mickael Rouvier and Pierre-Michel Bousquet
  • Version: 1.0
  • Date: 26/06/17

These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.

You can reproduce my results or freely adapt my code for your experiments.

Warning, before to run the system execute the makefile:

make

This executable split the training corpus (K-Fold) and tokenize the tweets:

sh run_corpus.sh

This executable train the different word embeddings:

sh run_word2vec.sh

This executable learn the models:

sh run_cnn.sh

This executable run the model on dev and test:

sh run_extract_dev.sh
sh run_extract_test.sh

At this point you can evaluate the CNNs:

ruby bin/scoring.rb data/task1_test.tokenize  results_test/cnn_task1_0_distant_size100_123.txt
ruby bin/scoring.rb data/task2_test.tokenize  results_test/cnn_task2_0_distant_size100_123.txt
ruby bin/scoring.rb data/task3_test.tokenize  results_test/cnn_task3_0_distant_size100_123.txt

This executable run the fusion system:

sh run_fusion.sh

Finally, you can evaluate the full-system:

ruby bin/scoring.rb data/task1_test.tokenize  output/equipe-8_tache1_run3.csv
ruby bin/scoring.rb data/task2_test.tokenize  output/equipe-8_tache2_run1.csv
ruby bin/scoring.rb data/task3_test.tokenize  output/equipe-8_tache3_run3.csv

Results

Baseline

We reproduce the sentiment analysis system of Kim (based on word2vec and CNN):

Corpus Baseline
Task1 59.55
Task2 77.18
Task3 57.59

DEFT 2017

These results are those LIA system presented in SemEval 2016 Sentiment Analysis:

Corpus Task1 Task2 Task3
Run1 60.23 78.31 57.83
Run2 63.44 77.39 58.49
Run3 65.00 77.43 59.39

Citing

The system is described in this paper:

@inproceedings{rouvier2017,
  author    = {Mickael Rouvier and Pierre-Michel Bousquet},
  title     = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
  booktitle = {DEFT 2107},
  year      = {2017},
  address   = {Orleans, France}
}