Rouvier Mickael / Deft 2017 - Sentiment Analysis

Browse Code »

Commit d2e52398d07e0a6c75f549038a3eb23ea6c5004f

Authored by Rouvier Mickael 2017-06-27 14:08:58 +0200

1 parent f5fbe398d6

Exists in master

Full-system

Showing 1 changed file with 106 additions and 0 deletions Inline Diff

README

README

Diff comments View file @ d2e5239

		1	# DEFT 2017 - Sentiment Analysis
		2
		3	- Authors: Mickael Rouvier and Pierre-Michel Bousquet
		4	- Version: 1.0
		5	- Date: 26/06/17
		6
		7	These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
		8
		9
		10	You can reproduce my results or freely adapt my code for your experiments.
		11
		12
		13	Warning, before to run the system execute the makefile:
		14
		15	```shell
		16	make
		17	```
		18
		19	This executable split the training corpus (K-Fold) and tokenize the tweets:
		20
		21	```shell
		22	sh run_corpus.sh
		23	```
		24
		25	This executable train the different word embeddings:
		26	```shell
		27	sh run_word2vec.sh
		28	```
		29
		30	This executable learn the models:
		31
		32	```shell
		33	sh run_cnn.sh
		34	```
		35
		36	This executable run the model on dev and test:
		37
		38	```shell
		39	sh run_extract_dev.sh
		40	sh run_extract_test.sh
		41	```
		42
		43	At this point you can score the CNNs:
		44	```shell
		45	ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
		46	ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
		47	ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
		48	```
		49
		50
		51	This executable run the fusion system:
		52	```shell
		53	sh run_fusion.sh
		54	```
		55
		56
		57	Finally, you can score the full-system:
		58	```shell
		59	ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
		60	ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
		61	ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
		62	```
		63
		64
		65
		66	# Results
		67
		68
		69	## Baseline
		70
		71	We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
		72
		73
		74	\| Corpus \| Baseline \|
		75	\| ------------ \|:-------------:\|
		76	\| Task1 \| 59.55 \|
		77	\| Task2 \| 77.18 \|
		78	\| Task3 \| 57.59 \|
		79
		80
		81
		82
		83	## DEFT 2017
		84
		85	These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
		86
		87	\| Corpus \| Task1 \| Task2 \| Task3 \|
		88	\| ----------- \|:-------------:\|:-------------:\|:-------------:\|
		89	\| Run1 \| 60.23 \| 78.31 \| 57.83 \|
		90	\| Run2 \| 63.44 \| 77.39 \| 58.49 \|
		91	\| Run3 \| 65.00 \| 77.43 \| 59.39 \|
		92
		93
		94	# Citing
		95
		96	The system is described in this paper:
		97
		98	@inproceedings{rouvier2017,
		99	author = {Mickael Rouvier and Pierre-Michel Bousquet},
		100	title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
		101	booktitle = {DEFT 2107},
		102	year = {2017},
		103	address = {Orleans, France}
		104	}
		105
		106
		107