Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Edit Vader_lexicon.txt in nltk for python to add words related to my domain

Tags:

I am using vader in nltk to find sentiments of each line in a file. I have 2 questions:

  1. I need to add words in vader_lexicon.txt however the syntax of which looks like :

assaults -2.5 0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3]

What does -2.5 and 0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3] represent?

How should i code it for a new word? Say i have to add something like '100%' , 'A1'.

  1. I can also see positive and negative words txt in nltk_data\corpora\opinion_lexicon folder. How are these getting utilised? Can I add my words in these txt files too?
like image 742
Mighty Avatar asked Jul 25 '18 08:07

Mighty


1 Answers

I believe that vader only uses the word and the first value when classifying text. If you want to add new words, you can simply create a dictionary of words and their sentiment values, which can be added using the update function:

from nltk.sentiment.vader import SentimentIntensityAnalyzer

Analyzer = SentimentIntensityAnalyser()
Analyzer.lexicon.update(your_dictionary)

You can manually assign words with sentiment values based on their perceived intensity of sentiment, or if this is impractical then you can assign a broad value across the two categories (e.g. -1.5 and 1.5).

You can use this script (not mine) to examine if your updates have been included:

import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd

Analyzer = SentimentIntensityAnalyzer()

sentence = 'enter your text to test'

tokenized_sentence = nltk.word_tokenize(sentence)
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]

for word in tokenized_sentence:
    if (Analyzer.polarity_scores(word)['compound']) >= 0.1:
        pos_word_list.append(word)
    elif (Analyzer.polarity_scores(word)['compound']) <= -0.1:
        neg_word_list.append(word)
    else:
        neu_word_list.append(word)                

print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list) 
score = Analyzer.polarity_scores(sentence)
print('\nScores:', score)

Before updating vader:

sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'

Positive: []
Neutral: ['stocks', 'were', 'volatile', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'calamities', 'in', 'the', 'Chinese', 'markets']
Negative: []
Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

After updating vader with a finance-based lexicon:

Analyzer.lexicon.update(Financial_Lexicon)
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'

Positive: []
Neutral: ['stocks', 'were', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'in', 'the', 'Chinese', 'markets']
Negative: ['volatile', 'calamities']
Scores: {'neg': 0.294, 'neu': 0.706, 'pos': 0.0, 'compound': -0.6124}
like image 169
Laurie Avatar answered Oct 13 '22 02:10

Laurie