Good day, I'm attempting to write a sentimental analysis application in python (Using naive-bayes classifier) with the aim to categorize phrases from news as being positive or negative. And I'm having a bit of trouble finding an appropriate corpus for that. I tried using "General Inquirer" (http://www.wjh.harvard.edu/~inquirer/homecat.htm) which works OK but I have one big problem there. Since it is a word list, not a phrase list I observe the following problem when trying to label the following sentence:
He is not expected to win.
This sentence is categorized as being positive, which is wrong. The reason for that is that "win" is positive, but "not" does not carry any meaning since "not win" is a phrase. Can anyone suggest either a corpus or a work around for that issue? Your help and insight is greatly appriciated.
See for example: "What's great and what's not: learning to classify the scope of negation for improved sentiment analysis" by Councill, McDonald, and Velikovich
http://dl.acm.org/citation.cfm?id=1858959.1858969
and followups,
http://scholar.google.com/scholar?cites=3029019835762139237&as_sdt=5,33&sciodt=0,33&hl=en
e.g. by Morante et al 2011
http://eprints.pascal-network.org/archive/00007634/
In this case, the work not modifies the meaning of the phrase expecteed to win, reversing it. To identify this, you would need to POS tag the sentence and apply the negative adverb not to the (I think) verb phrase as a negation. I don't know if there is a corpus that would tell you that not would be this type of modifier or not, however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With