Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how are sentiment analysis computed in blob

I use the following to compute the sentiment of 200 short sentences. I did not use a training data set:

for sentence in textblob.sentences: print(sentence.sentiment)

The analysis returns two values: polarity and subjectivity. From what I read online, the polarity score is a float within the range [-1.0, 1.0] where 0 indicates neutral, +1 a very positive attitude and -1 a very negative attitude. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

So, now my question: How are those scores computed?

I have some zeros for the polarity score of almost half of the phrases and I am wondering whether the zero indicates neutrality or rather the fact that the phrase does not feature words that have a polarity. I am wondering the same question for another sentiment analyser:NaiveBayesAnalyzer.

Thank you for your help!
Marie

like image 659
MarieJ Avatar asked Dec 29 '15 20:12

MarieJ


People also ask

How does TextBlob calculate sentiment?

When computing a sentiment for a single word, TextBlob employs the “averaging” technique, which is applied to polarity values to calculate a polarity score for a single word, and thus a similar procedure applies to every single word, resulting in a combined polarity for larger texts.

How accurate is TextBlob sentiment analysis?

In the above-mentioned confusion matrices VADER gets an overall accuracy of 63.3% however TextBlob gets an accuracy of 41.3%.

What algorithm does TextBlob use?

One of the great things about TextBlob is that it allows the user to choose an algorithm for implementation of the high-level NLP tasks: PatternAnalyzer - a default classifier that is built on the pattern library. NaiveBayesAnalyzer - an NLTK model trained on a movie reviews corpus.

How are sentiment scores calculated?

The number of occurrences of positive and negative words in each document was counted to determine the document's sentiment score. To calculate the document sentiment score, each positive word counts as + 1 and each negative word as − 1.


Video Answer


2 Answers

According to TextBlob creator, Steven Loria,TextBlob's sentiment analyzer delegates to pattern.en's sentiment module. Pattern.en itself uses a dictionary-based approach with a few heuristics to handle, e.g. negation. You can find the source here, which is a vendorized version of pattern.en's text module, with minor tweaks for Python 3 compatibility.

like image 127
Pie-ton Avatar answered Sep 23 '22 17:09

Pie-ton


The TextBlob NaiveBayesAnalyzer is apparently based on the Stanford NLTK. The Naive Bayes algorithm in general is explained here: A simple explanation of Naive Bayes Classification

and its application to sentiment and objectivity is described here: http://nlp.stanford.edu/courses/cs224n/2009/fp/24.pdf

Basically you're right that certain words will be labeled something like "40% positive / 60% negative" based on how they were used in some body of training data (for the Stanford NLTK, the training data was movie reviews). Then the scores of all words in your sentence get multiplied to produce the sentence score.

I haven't tested, but I expect that if the library returns exactly 0.0, then your sentence didn't contain any words that had a polarity in the NLTK training set. I suspect the researchers didn't include them because 1) they were too rare in the training data or 2) they were known to be meaningless (such as "the", "a", "and", etc.).

That goes for the Naive Bayes analyzer. Regarding the PatternAnalyzer, the TextBlob docs say it's based on the "pattern" library, but it doesn't seem to document how it works. I suspect something similar is happening though.

like image 31
Luke Avatar answered Sep 21 '22 17:09

Luke