Rouvier Mickael / Deft 2017 - Sentiment Analysis

Browse Code »

Commit d2e52398d07e0a6c75f549038a3eb23ea6c5004f

Authored by Rouvier Mickael 2017-06-27 14:08:58 +0200

1 parent f5fbe398d6

Exists in master

Full-system

Showing 1 changed file with 106 additions and 0 deletions Side-by-side Diff

README

README

Diff comments View file @ d2e5239

	1	+# DEFT 2017 - Sentiment Analysis
	2	+
	3	+- Authors: Mickael Rouvier and Pierre-Michel Bousquet
	4	+- Version: 1.0
	5	+- Date: 26/06/17
	6	+
	7	+These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
	8	+
	9	+
	10	+You can reproduce my results or freely adapt my code for your experiments.
	11	+
	12	+
	13	+Warning, before to run the system execute the makefile:
	14	+
	15	+```shell
	16	+make
	17	+```
	18	+
	19	+This executable split the training corpus (K-Fold) and tokenize the tweets:
	20	+
	21	+```shell
	22	+sh run_corpus.sh
	23	+```
	24	+
	25	+This executable train the different word embeddings:
	26	+```shell
	27	+sh run_word2vec.sh
	28	+```
	29	+
	30	+This executable learn the models:
	31	+
	32	+```shell
	33	+sh run_cnn.sh
	34	+```
	35	+
	36	+This executable run the model on dev and test:
	37	+
	38	+```shell
	39	+sh run_extract_dev.sh
	40	+sh run_extract_test.sh
	41	+```
	42	+
	43	+At this point you can score the CNNs:
	44	+```shell
	45	+ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
	46	+ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
	47	+ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
	48	+```
	49	+
	50	+
	51	+This executable run the fusion system:
	52	+```shell
	53	+sh run_fusion.sh
	54	+```
	55	+
	56	+
	57	+Finally, you can score the full-system:
	58	+```shell
	59	+ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
	60	+ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
	61	+ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
	62	+```
	63	+
	64	+
	65	+
	66	+# Results
	67	+
	68	+
	69	+## Baseline
	70	+
	71	+We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
	72	+
	73	+
	74	+\| Corpus \| Baseline \|
	75	+\| ------------ \|:-------------:\|
	76	+\| Task1 \| 59.55 \|
	77	+\| Task2 \| 77.18 \|
	78	+\| Task3 \| 57.59 \|
	79	+
	80	+
	81	+
	82	+
	83	+## DEFT 2017
	84	+
	85	+These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
	86	+
	87	+\| Corpus \| Task1 \| Task2 \| Task3 \|
	88	+\| ----------- \|:-------------:\|:-------------:\|:-------------:\|
	89	+\| Run1 \| 60.23 \| 78.31 \| 57.83 \|
	90	+\| Run2 \| 63.44 \| 77.39 \| 58.49 \|
	91	+\| Run3 \| 65.00 \| 77.43 \| 59.39 \|
	92	+
	93	+
	94	+# Citing
	95	+
	96	+The system is described in this paper:
	97	+
	98	+ @inproceedings{rouvier2017,
	99	+ author = {Mickael Rouvier and Pierre-Michel Bousquet},
	100	+ title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
	101	+ booktitle = {DEFT 2107},
	102	+ year = {2017},
	103	+ address = {Orleans, France}
	104	+ }