Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining "Mood" of Textual Phrases through Lexical Analysis

I am looking to apply scores (positive, negative or neutral) to short phrases of text. Short of parsing out emoticons and making assumptions based on their usage, I'm unsure of what else to try. Can anyone provide examples, research papers, articles, etc. that take a more lexical analysis to this problem.

I am thinking things like adverb usage, punctuation misuse/repetition, spelling/grammar errors could all be decent indicators of the author's mood in an almost binary sense (good or bad).

like image 786
Michael Wales Avatar asked Jun 15 '09 15:06

Michael Wales


1 Answers

This sounds like a pretty clear binary classification task, where you can simplify the issue to positive or negative, and then make the most entropic decisions or those that haven't reached a threshold of certainty by way of probability mass set to neutral.

Your biggest hurdle will be getting training data for a stochastic machine learning method. You could easily do this with a readily available maximum entropy model such as the Toolkit for Advanced Discriminative Modeling or Mallet. The features you described would just have to be formatted to the inputs these models use.

In order to get training data, you can either do some kind of paid crowdsourcing like Amazon's Mechanical Turk or just do it yourself, maybe with the help of a friend. You'll need a lot of data for this. You can improve the predictive strength of your model in light of a dearth of data with approaches like active learning, ensembling, or boosting, but it's important to test these against real-world data as best as you can and pick what works best in a practical application.

If you're looking for papers for this, you'll want to look at the term 'sentiment analysis' in Google Scholar. The Association for Computational Linguistics has a lot of free and useful papers from conferences and journals which address the problem from a linguistic as well as algorithmic standpoint. I'd also browse their archives. Good luck!

like image 50
Robert Elwell Avatar answered Sep 30 '22 16:09

Robert Elwell