Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to edit NLTK's vader sentiment lexicon?

I would like to add words to the vader_lexicon.txt to specify polarity scores to a word. What is the right way to do so?

I saw this file in AppData\Roaming\nltk_data\sentiment\vader_lexicon. The file consists of the word, its polarity, intensity, and an array of 10 intensity scores given by "10 independent human raters". [1] However, when I edited it, nothing changed in the results of the following code:

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
s = sia.polarity_scores("my string here")

I think that this text file is accessed by my code when I called SentimentIntensityAnalyzer's constructor. [2] Do you have any ideas on how I can edit a pre-made lexicon?

Sources:

[1] https://github.com/cjhutto/vaderSentiment

[2] http://www.nltk.org/api/nltk.sentiment.html

like image 244
noobalert Avatar asked Nov 08 '16 07:11

noobalert


People also ask

How accurate is Vader sentiment analysis?

accuracy (with classification thresholds set at –0.05 and +0.05 for all normalized sentiment scores between -1 and 1), we can see that VADER (F1 = 0.96) actually outper- forms even individual human raters (F1 = 0.84) at correctly classifying the sentiment of tweets.

How many words are in the Vader lexicon?

There are over 7500 tokens listed in VADER lexicon. (You can also add your own if you like.) VADER also considers grammatical and syntactical rules to measure intensity based on word order and sensitive relationships between terms.

How does Vader lexicon work?

VADER uses a combination of A sentiment lexicon is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative. VADER not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.

What is NLTK sentiment Vader?

VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion. It is available in the NLTK package and can be applied directly to unlabeled text data.


2 Answers

For anyone interested, this can also be achieved without having to manually edit the vader lexicon .txt file. Once loaded the lexicon is a normal dictionary with words as keys and scores as values. As provided by repoleved in this post:

from nltk.sentiment.vader import SentimentIntensityAnalyzer

new_words = {
    'foo': 2.0,
    'bar': -3.4,
}

SIA = SentimentIntensityAnalyzer()

SIA.lexicon.update(new_words)

If you wish to remove words, use the '.pop' function:

SIA = SentimentIntensityAnalyzer()

SIA.lexicon.pop('no')
like image 173
Laurie Avatar answered Sep 24 '22 10:09

Laurie


I found the fix. I zipped the folder vader_lexicon that contains the txt file and the changes I applied is now the one being accessed.

like image 23
noobalert Avatar answered Sep 22 '22 10:09

noobalert