I am using vader
in nltk
to find sentiments of each line in a file. I have 2 questions:
vader_lexicon.txt
however the syntax of which looks like :assaults -2.5 0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3]
What does -2.5
and 0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3]
represent?
How should i code it for a new word? Say i have to add something like '100%'
, 'A1'
.
nltk_data\corpora\opinion_lexicon
folder. How are these getting utilised? Can I add my words in these txt files too?I believe that vader only uses the word and the first value when classifying text. If you want to add new words, you can simply create a dictionary of words and their sentiment values, which can be added using the update function:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
Analyzer = SentimentIntensityAnalyser()
Analyzer.lexicon.update(your_dictionary)
You can manually assign words with sentiment values based on their perceived intensity of sentiment, or if this is impractical then you can assign a broad value across the two categories (e.g. -1.5 and 1.5).
You can use this script (not mine) to examine if your updates have been included:
import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
Analyzer = SentimentIntensityAnalyzer()
sentence = 'enter your text to test'
tokenized_sentence = nltk.word_tokenize(sentence)
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]
for word in tokenized_sentence:
if (Analyzer.polarity_scores(word)['compound']) >= 0.1:
pos_word_list.append(word)
elif (Analyzer.polarity_scores(word)['compound']) <= -0.1:
neg_word_list.append(word)
else:
neu_word_list.append(word)
print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list)
score = Analyzer.polarity_scores(sentence)
print('\nScores:', score)
Before updating vader:
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'
Positive: []
Neutral: ['stocks', 'were', 'volatile', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'calamities', 'in', 'the', 'Chinese', 'markets']
Negative: []
Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
After updating vader with a finance-based lexicon:
Analyzer.lexicon.update(Financial_Lexicon)
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'
Positive: []
Neutral: ['stocks', 'were', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'in', 'the', 'Chinese', 'markets']
Negative: ['volatile', 'calamities']
Scores: {'neg': 0.294, 'neu': 0.706, 'pos': 0.0, 'compound': -0.6124}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With