How is the Vader 'compound' polarity score calculated in Python NLTK?

Tags:

I'm using the Vader SentimentAnalyzer to obtain the polarity scores. I used the probability scores for positive/negative/neutral before, but I just realized the "compound" score, ranging from -1 (most neg) to 1 (most pos) would provide a single measure of polarity. I wonder how the "compound" score computed. Is that calculated from the [pos, neu, neg] vector?

783

asked Oct 30 '16 04:10

alicecongcong

2 Answers

The VADER algorithm outputs sentiment scores to 4 classes of sentiments https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L441:

neg: Negative
neu: Neutral
pos: Positive
compound: Compound (i.e. aggregated score)

Let's walk through the code, the first instance of compound is at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L421, where it computes:

compound = normalize(sum_s)

The normalize() function is defined as such at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L107:

def normalize(score, alpha=15):     """     Normalize the score to be between -1 and 1 using an alpha that     approximates the max expected value     """     norm_score = score/math.sqrt((score*score) + alpha)     return norm_score

So there's a hyper-parameter alpha.

As for the sum_s, it is a sum of the sentiment arguments passed to the score_valence() function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L413

And if we trace back this sentiment argument, we see that it's computed when calling the polarity_scores() function at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L217:

def polarity_scores(self, text):     """     Return a float for sentiment strength based on the input text.     Positive values are positive valence, negative value are negative     valence.     """     sentitext = SentiText(text)     #text, words_and_emoticons, is_cap_diff = self.preprocess(text)      sentiments = []     words_and_emoticons = sentitext.words_and_emoticons     for item in words_and_emoticons:         valence = 0         i = words_and_emoticons.index(item)         if (i < len(words_and_emoticons) - 1 and item.lower() == "kind" and \             words_and_emoticons[i+1].lower() == "of") or \             item.lower() in BOOSTER_DICT:             sentiments.append(valence)             continue          sentiments = self.sentiment_valence(valence, sentitext, item, i, sentiments)      sentiments = self._but_check(words_and_emoticons, sentiments)

Looking at the polarity_scores function, what it's doing is to iterate through the whole SentiText lexicon and checks with the rule-based sentiment_valence() function to assign the valence score to the sentiment https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L243, see Section 2.1.1 of http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

So going back to the compound score, we see that:

the compound score is a normalized score of sum_s and
sum_s is the sum of valence computed based on some heuristics and a sentiment lexicon (aka. Sentiment Intensity) and
the normalized score is simply the sum_s divided by its square plus an alpha parameter that increases the denominator of the normalization function.

Is that calculated from the [pos, neu, neg] vector?

Not really =)

If we take a look at the score_valence function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L411, we see that the compound score is computed with the sum_s before the pos, neg and neu scores are computed using _sift_sentiment_scores() that computes the invidiual pos, neg and neu scores using the raw scores from sentiment_valence() without the sum.

If we take a look at this alpha mathemagic, it seems the output of the normalization is rather unstable (if left unconstrained), depending on the value of alpha:

alpha=0:

enter image description here

alpha=15:

enter image description here

alpha=50000:

enter image description here

alpha=0.001:

enter image description here

It gets funky when it's negative:

alpha=-10:

enter image description here

alpha=-1,000,000:

enter image description here

alpha=-1,000,000,000:

enter image description here

answered Sep 22 '22 17:09

alvas

"About the Scoring" section at the github repo has a description.

answered Sep 21 '22 17:09

leonfrench

Related questions
                            
                                Column of lists, convert list to string as a new column
                            
                                How to get a GCP Bearer token programmatically with python
                            
                                different fields for add and change pages in admin
                            
                                How to check if a name/value pair exists when posting data?
                            
                                How to download python from command-line? [closed]
                            
                                Django rest framework permission_classes of ViewSet method
                            
                                Why does 1+++2 = 3?
                            
                                Python: Resize an existing array and fill with zeros
                            
                                ParseError: not well-formed (invalid token) using cElementTree
                            
                                Do union types actually exist in python?
                            
                                Access IP Camera in Python OpenCV
                            
                                I cannot install numpy because it can't find python 2.7, althought I have installed python
                            
                                Format time string in Python 3.3
                            
                                How do I create a CSV file from database in Python?
                            
                                Immutable dictionary, only use as a key for another dictionary
                            
                                Merging two CSV files using Python
                            
                                Default dict keys to avoid KeyError
                            
                                How to run gunicorn from a folder that is not the django project folder
                            
                                What is the fastest way to empty s3 bucket using boto3?
                            
                                Can't call strftime on numpy.datetime64, no definition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is the Vader 'compound' polarity score calculated in Python NLTK?

Tags:

python

nlp

nltk

sentiment-analysis

vader

alicecongcong

People also ask

2 Answers

alvas

leonfrench

Recent Activity

Donate For Us