It appears that the simplest, naivest way to do basic sentiment analysis is with a Bayesian classifier (confirmed by what I'm finding here on SO). Any counter-arguments or other suggestions?
Sentiment analysis, also referred to as opinion mining, is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text. This is a popular way for organizations to determine and categorize opinions about a product, service, or idea.
There are various ways to calculate a sentiment score, but the most common method is to use a dictionary of negative, neutral, or positive words. The text is then analyzed to see how many negative and positive words it contains. This can give us a good idea of the overall sentiment of the text.
Sentiment analysis studies the subjective information in an expression, that is, the opinions, appraisals, emotions, or attitudes towards a topic, person or entity. Expressions can be classified as positive, negative, or neutral. For example: “I really like the new design of your website!” → Positive.
A Bayesian classifier with a bag of words representation is the simplest statistical method. You can get significantly better results by moving to more advanced classifiers and feature representation, at the cost of more complexity.
Statistical methods aren't the only game in town. Rule based methods that have more understanding of the structure of the text are the other main option. From what I have seen, these don't actually perform as well as statistical methods.
I recommend Manning and Schütze's Foundations of Statistical Natural Language Processing chapter 16, Text Categorization.
I can't think of a simpler, more naive way to do Sentiment Analysis, but you might consider using a Support Vector Machine instead of Naive Bayes (in some machine learning toolkits, this can be a drop-in replacement). Have a look at "Thumbs up? Sentiment Classification using Machine Learning Techniques" by Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan which was one of the earliest papers on these techniques, and gives a good table of accuracy results on a family of related techniques, none of which are any more complicated (from a client perspective) than any of the others.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With