Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sentiwordnet scoring with python

I have been working on a research in relation with twitter sentiment analysis. I have a little knowledge on how to code on Python. Since my research is related with coding, I have done some research on how to analyze sentiment using Python, and the below is how far I have come to: 1.Tokenization of tweets 2. POS tagging of token and the remaining is calculating Positive and Negative of the sentiment which the issue i am facing now and need your help.

Below is my code example:

import nltk
sentence = "Iphone6 camera is awesome for low light "
token = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(token)

Therefore, I want to ask if anybody can help me to show/guide the example of using python to code about sentiwordnet to calculate the positive and negative score of the tweeets that has already been POS tagged. thank in advance

like image 732
pechdara Avatar asked Jul 08 '16 09:07

pechdara


Video Answer


2 Answers

It's a little unclear as to what exactly your question is. Do you need a guide to using Sentiwordnet? If so check out this link,

http://www.nltk.org/howto/sentiwordnet.html

Since you've already tokenized and POS tagged the words, all you need to do now is to use this syntax,

swn.senti_synset('breakdown.n.03')

Breaking down the argument,

  • 'breakdown' = word you need scores for.
  • 'n' = part of speech
  • '03' = Usage (01 for most common usage and a higher number would indicate lesser common usages)

So for each tuple in your tagged array, create a string as above and pass it to the senti_synset function to get the positive, negative and objective score for that word.

Caveat: The POS tagger gives you a different tag than the one senti_synset accepts. Use the following to convert to synset notation.

n - NOUN 
v - VERB 
a - ADJECTIVE 
s - ADJECTIVE SATELLITE 
r - ADVERB 

(Credits to Using Sentiwordnet 3.0 for the above notation)

That being said, it is generally not a great idea to use Sentiwordnet for Twitter sentiment analysis and here's why,

Tweets are filled with typos and non-dictionary words which Sentiwordnet often times does not recognize. To counter this problem, either lemmatize/stem your tweets before you pos tag them or use a Machine Learning classifier such as Naive Bayes for which NLTK has built in functions. As for the training dataset for the classifier, either manually annotate a dataset or use a pre-labelled set such as, as the Sentiment140 corpus.

If you are uninterested in actually performing the sentiment analysis but need a sentiment tag for a given tweet, you can always use the Sentiment140 API for this purpose.

like image 133
Saravana Kumar Avatar answered Sep 23 '22 07:09

Saravana Kumar


@Saravana Kumar has a wonderful answer.

To add detailed code to it i am writing this. I have referred link https://nlpforhackers.io/sentiment-analysis-intro/

from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.stem import PorterStemmer

def penn_to_wn(tag):
    """
    Convert between the PennTreebank tags to simple Wordnet tags
    """
    if tag.startswith('J'):
        return wn.ADJ
    elif tag.startswith('N'):
        return wn.NOUN
    elif tag.startswith('R'):
        return wn.ADV
    elif tag.startswith('V'):
        return wn.VERB
    return None

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

def get_sentiment(word,tag):
    """ returns list of pos neg and objective score. But returns empty list if not present in senti wordnet. """

    wn_tag = penn_to_wn(tag)
    if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
        return []

    lemma = lemmatizer.lemmatize(word, pos=wn_tag)
    if not lemma:
        return []

    synsets = wn.synsets(word, pos=wn_tag)
    if not synsets:
        return []

    # Take the first sense, the most common
    synset = synsets[0]
    swn_synset = swn.senti_synset(synset.name())

    return [swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score()]


ps = PorterStemmer()
words_data = ['this','movie','is','wonderful']
# words_data = [ps.stem(x) for x in words_data] # if you want to further stem the word

pos_val = nltk.pos_tag(words_data)
senti_val = [get_sentiment(x,y) for (x,y) in pos_val]

print(f"pos_val is {pos_val}")
print(f"senti_val is {senti_val}")

Output

pos_val is [('this', 'DT'), ('movie', 'NN'), ('is', 'VBZ'), ('wonderful', 'JJ')]
senti_val is [[], [0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]
like image 22
shantanu pathak Avatar answered Sep 24 '22 07:09

shantanu pathak