Rouvier Mickael / Deft 2017 - Sentiment Analysis

Browse Code »

Commit 2dca315085d911c2612883625e1e06ec589fc623

Authored by Rouvier Mickael 8 years ago

1 parent d2e52398d0

Exists in master

Move README in README.md

Showing 2 changed files with 106 additions and 106 deletions Side-by-side Diff

README
README.md

README

View file @ 2dca315

1		-# DEFT 2017 - Sentiment Analysis
2		-
3		-- Authors: Mickael Rouvier and Pierre-Michel Bousquet
4		-- Version: 1.0
5		-- Date: 26/06/17
6		-
7		-These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
8		-
9		-
10		-You can reproduce my results or freely adapt my code for your experiments.
11		-
12		-
13		-Warning, before to run the system execute the makefile:
14		-
15		-```shell
16		-make
17		-```
18		-
19		-This executable split the training corpus (K-Fold) and tokenize the tweets:
20		-
21		-```shell
22		-sh run_corpus.sh
23		-```
24		-
25		-This executable train the different word embeddings:
26		-```shell
27		-sh run_word2vec.sh
28		-```
29		-
30		-This executable learn the models:
31		-
32		-```shell
33		-sh run_cnn.sh
34		-```
35		-
36		-This executable run the model on dev and test:
37		-
38		-```shell
39		-sh run_extract_dev.sh
40		-sh run_extract_test.sh
41		-```
42		-
43		-At this point you can score the CNNs:
44		-```shell
45		-ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
46		-ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
47		-ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
48		-```
49		-
50		-
51		-This executable run the fusion system:
52		-```shell
53		-sh run_fusion.sh
54		-```
55		-
56		-
57		-Finally, you can score the full-system:
58		-```shell
59		-ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
60		-ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
61		-ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
62		-```
63		-
64		-
65		-
66		-# Results
67		-
68		-
69		-## Baseline
70		-
71		-We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
72		-
73		-
74		-\| Corpus \| Baseline \|
75		-\| ------------ \|:-------------:\|
76		-\| Task1 \| 59.55 \|
77		-\| Task2 \| 77.18 \|
78		-\| Task3 \| 57.59 \|
79		-
80		-
81		-
82		-
83		-## DEFT 2017
84		-
85		-These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
86		-
87		-\| Corpus \| Task1 \| Task2 \| Task3 \|
88		-\| ----------- \|:-------------:\|:-------------:\|:-------------:\|
89		-\| Run1 \| 60.23 \| 78.31 \| 57.83 \|
90		-\| Run2 \| 63.44 \| 77.39 \| 58.49 \|
91		-\| Run3 \| 65.00 \| 77.43 \| 59.39 \|
92		-
93		-
94		-# Citing
95		-
96		-The system is described in this paper:
97		-
98		- @inproceedings{rouvier2017,
99		- author = {Mickael Rouvier and Pierre-Michel Bousquet},
100		- title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
101		- booktitle = {DEFT 2107},
102		- year = {2017},
103		- address = {Orleans, France}
104		- }

README.md

Diff comments View file @ 2dca315

	1	+# DEFT 2017 - Sentiment Analysis
	2	+
	3	+- Authors: Mickael Rouvier and Pierre-Michel Bousquet
	4	+- Version: 1.0
	5	+- Date: 26/06/17
	6	+
	7	+These scripts provide the LIA system that I used for the DEFT 2017 - Sentiment Analysis. The LIA system is a multi-view ensemble of Convolutional Neural Networks (CNN). Four different word embeddins are used to initialize the input of CNN : lexical embedding, sentiment embedding (multi-task learning), sentiment embedding (distant learning) and sentiment embedding (negative sampling). The system is a fusion at the score level of the different CNNs variants.
	8	+
	9	+
	10	+You can reproduce my results or freely adapt my code for your experiments.
	11	+
	12	+
	13	+Warning, before to run the system execute the makefile:
	14	+
	15	+```shell
	16	+make
	17	+```
	18	+
	19	+This executable split the training corpus (K-Fold) and tokenize the tweets:
	20	+
	21	+```shell
	22	+sh run_corpus.sh
	23	+```
	24	+
	25	+This executable train the different word embeddings:
	26	+```shell
	27	+sh run_word2vec.sh
	28	+```
	29	+
	30	+This executable learn the models:
	31	+
	32	+```shell
	33	+sh run_cnn.sh
	34	+```
	35	+
	36	+This executable run the model on dev and test:
	37	+
	38	+```shell
	39	+sh run_extract_dev.sh
	40	+sh run_extract_test.sh
	41	+```
	42	+
	43	+At this point you can score the CNNs:
	44	+```shell
	45	+ruby bin/scoring.rb data/task1_test.tokenize results_test/cnn_task1_0_distant_size100_123.txt
	46	+ruby bin/scoring.rb data/task2_test.tokenize results_test/cnn_task2_0_distant_size100_123.txt
	47	+ruby bin/scoring.rb data/task3_test.tokenize results_test/cnn_task3_0_distant_size100_123.txt
	48	+```
	49	+
	50	+
	51	+This executable run the fusion system:
	52	+```shell
	53	+sh run_fusion.sh
	54	+```
	55	+
	56	+
	57	+Finally, you can score the full-system:
	58	+```shell
	59	+ruby bin/scoring.rb data/task1_test.tokenize output/equipe-8_tache1_run3.csv
	60	+ruby bin/scoring.rb data/task2_test.tokenize output/equipe-8_tache2_run1.csv
	61	+ruby bin/scoring.rb data/task3_test.tokenize output/equipe-8_tache3_run3.csv
	62	+```
	63	+
	64	+
	65	+
	66	+# Results
	67	+
	68	+
	69	+## Baseline
	70	+
	71	+We reproduce the sentiment analysis system of Kim (based on Word embeddings and CNN):
	72	+
	73	+
	74	+\| Corpus \| Baseline \|
	75	+\| ------------ \|:-------------:\|
	76	+\| Task1 \| 59.55 \|
	77	+\| Task2 \| 77.18 \|
	78	+\| Task3 \| 57.59 \|
	79	+
	80	+
	81	+
	82	+
	83	+## DEFT 2017
	84	+
	85	+These results are those SENSEI-LIF system presented in SemEval 2016 Sentiment Analysis:
	86	+
	87	+\| Corpus \| Task1 \| Task2 \| Task3 \|
	88	+\| ----------- \|:-------------:\|:-------------:\|:-------------:\|
	89	+\| Run1 \| 60.23 \| 78.31 \| 57.83 \|
	90	+\| Run2 \| 63.44 \| 77.39 \| 58.49 \|
	91	+\| Run3 \| 65.00 \| 77.43 \| 59.39 \|
	92	+
	93	+
	94	+# Citing
	95	+
	96	+The system is described in this paper:
	97	+
	98	+ @inproceedings{rouvier2017,
	99	+ author = {Mickael Rouvier and Pierre-Michel Bousquet},
	100	+ title = {LIA @ DEFT’2017 : Multi-view Ensemble of Convolutional Neural Network},
	101	+ booktitle = {DEFT 2107},
	102	+ year = {2017},
	103	+ address = {Orleans, France}
	104	+ }