Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Phrase corpus for sentimental analysis

Tags:

python

nlp

nltk

Good day, I'm attempting to write a sentimental analysis application in python (Using naive-bayes classifier) with the aim to categorize phrases from news as being positive or negative. And I'm having a bit of trouble finding an appropriate corpus for that. I tried using "General Inquirer" (http://www.wjh.harvard.edu/~inquirer/homecat.htm) which works OK but I have one big problem there. Since it is a word list, not a phrase list I observe the following problem when trying to label the following sentence:

He is not expected to win.

This sentence is categorized as being positive, which is wrong. The reason for that is that "win" is positive, but "not" does not carry any meaning since "not win" is a phrase. Can anyone suggest either a corpus or a work around for that issue? Your help and insight is greatly appriciated.

like image 744
TE0 Avatar asked May 28 '12 19:05

TE0


2 Answers

See for example: "What's great and what's not: learning to classify the scope of negation for improved sentiment analysis" by Councill, McDonald, and Velikovich

http://dl.acm.org/citation.cfm?id=1858959.1858969

and followups,

http://scholar.google.com/scholar?cites=3029019835762139237&as_sdt=5,33&sciodt=0,33&hl=en

e.g. by Morante et al 2011

http://eprints.pascal-network.org/archive/00007634/

like image 173
Gregory Marton Avatar answered Sep 18 '22 03:09

Gregory Marton


In this case, the work not modifies the meaning of the phrase expecteed to win, reversing it. To identify this, you would need to POS tag the sentence and apply the negative adverb not to the (I think) verb phrase as a negation. I don't know if there is a corpus that would tell you that not would be this type of modifier or not, however.

like image 33
stevedbrown Avatar answered Sep 19 '22 03:09

stevedbrown