I have been working on a research in relation with twitter sentiment analysis. I have a little knowledge on how to code on Python. Since my research is related with coding, I have done some research on how to analyze sentiment using Python, and the below is how far I have come to: 1.Tokenization of tweets 2. POS tagging of token and the remaining is calculating Positive and Negative of the sentiment which the issue i am facing now and need your help.
Below is my code example:
import nltk
sentence = "Iphone6 camera is awesome for low light "
token = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(token)
Therefore, I want to ask if anybody can help me to show/guide the example of using python to code about sentiwordnet to calculate the positive and negative score of the tweeets that has already been POS tagged. thank in advance
It's a little unclear as to what exactly your question is. Do you need a guide to using Sentiwordnet? If so check out this link,
http://www.nltk.org/howto/sentiwordnet.html
Since you've already tokenized and POS tagged the words, all you need to do now is to use this syntax,
swn.senti_synset('breakdown.n.03')
Breaking down the argument,
So for each tuple in your tagged array, create a string as above and pass it to the senti_synset function to get the positive, negative and objective score for that word.
Caveat: The POS tagger gives you a different tag than the one senti_synset accepts. Use the following to convert to synset notation.
n - NOUN
v - VERB
a - ADJECTIVE
s - ADJECTIVE SATELLITE
r - ADVERB
(Credits to Using Sentiwordnet 3.0 for the above notation)
That being said, it is generally not a great idea to use Sentiwordnet for Twitter sentiment analysis and here's why,
Tweets are filled with typos and non-dictionary words which Sentiwordnet often times does not recognize. To counter this problem, either lemmatize/stem your tweets before you pos tag them or use a Machine Learning classifier such as Naive Bayes for which NLTK has built in functions. As for the training dataset for the classifier, either manually annotate a dataset or use a pre-labelled set such as, as the Sentiment140 corpus.
If you are uninterested in actually performing the sentiment analysis but need a sentiment tag for a given tweet, you can always use the Sentiment140 API for this purpose.
@Saravana Kumar has a wonderful answer.
To add detailed code to it i am writing this. I have referred link https://nlpforhackers.io/sentiment-analysis-intro/
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.stem import PorterStemmer
def penn_to_wn(tag):
"""
Convert between the PennTreebank tags to simple Wordnet tags
"""
if tag.startswith('J'):
return wn.ADJ
elif tag.startswith('N'):
return wn.NOUN
elif tag.startswith('R'):
return wn.ADV
elif tag.startswith('V'):
return wn.VERB
return None
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
def get_sentiment(word,tag):
""" returns list of pos neg and objective score. But returns empty list if not present in senti wordnet. """
wn_tag = penn_to_wn(tag)
if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
return []
lemma = lemmatizer.lemmatize(word, pos=wn_tag)
if not lemma:
return []
synsets = wn.synsets(word, pos=wn_tag)
if not synsets:
return []
# Take the first sense, the most common
synset = synsets[0]
swn_synset = swn.senti_synset(synset.name())
return [swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score()]
ps = PorterStemmer()
words_data = ['this','movie','is','wonderful']
# words_data = [ps.stem(x) for x in words_data] # if you want to further stem the word
pos_val = nltk.pos_tag(words_data)
senti_val = [get_sentiment(x,y) for (x,y) in pos_val]
print(f"pos_val is {pos_val}")
print(f"senti_val is {senti_val}")
Output
pos_val is [('this', 'DT'), ('movie', 'NN'), ('is', 'VBZ'), ('wonderful', 'JJ')]
senti_val is [[], [0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With