Rouvier Mickael / Deft 2017 - Sentiment Analysis

Blame view

README.md 2.77 KB

d2e52398d Rouvier Mickael Full-system	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41	# DEFT 2017 - Sentiment Analysis - Authors: Mickael Rouvier and Pierre-Michel Bousquet - Version: 1.0 - Date: 26/06/17 These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants. You can reproduce my results or freely adapt my code for your experiments. Warning, before to run the system execute the makefile: ```shell make ``` This executable split the training corpus (K-Fold) and tokenize the tweets: ```shell sh run_corpus.sh ``` This executable train the different word embeddings: ```shell sh run_word2vec.sh ``` This executable learn the models: ```shell sh run_cnn.sh ``` This executable run the model on dev and test: ```shell sh run_extract_dev.sh sh run_extract_test.sh ```
362b552ee Rouvier Mickael upload system	42	At this point you can evaluate the CNNs:
d2e52398d Rouvier Mickael Full-system	43 44 45 46 47 48 49 50 51 52 53	```shell ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt ``` This executable run the fusion system: ```shell sh run_fusion.sh ```
362b552ee Rouvier Mickael upload system	54	Finally, you can evaluate the full-system:
d2e52398d Rouvier Mickael Full-system	55 56 57 58 59 60 61 62 63 64 65 66	```shell ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv ``` # Results ## Baseline
362b552ee Rouvier Mickael upload system	67	We reproduce the sentiment analysis system of Kim (based on word2vec and CNN):
d2e52398d Rouvier Mickael Full-system	68 69 70 71 72 73 74 75 76 77 78 79	\| Corpus \| Baseline \| \| ------------ \|:-------------:\| \| Task1 \| 59.55 \| \| Task2 \| 77.18 \| \| Task3 \| 57.59 \| ## DEFT 2017
362b552ee Rouvier Mickael upload system	80	These results are those LIA system presented in SemEval 2016 Sentiment Analysis:
d2e52398d Rouvier Mickael Full-system	81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99	\| Corpus \| Task1 \| Task2 \| Task3 \| \| ----------- \|:-------------:\|:-------------:\|:-------------:\| \| Run1 \| 60.23 \| 78.31 \| 57.83 \| \| Run2 \| 63.44 \| 77.39 \| 58.49 \| \| Run3 \| 65.00 \| 77.43 \| 59.39 \| # Citing The system is described in this paper: @inproceedings{rouvier2017, author = {Mickael Rouvier and Pierre-Michel Bousquet}, title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network}, booktitle = {DEFT 2107}, year = {2017}, address = {Orleans, France} }