Sentiment analysis with NLTK python for sentences using sample data or webservice?

Tags:

I am embarking upon a NLP project for sentiment analysis.

I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task.

Here is my task:

I start with one long piece of data (lets say several hundred tweets on the subject of the UK election from their webservice)
I would like to break this up into sentences (or info no longer than 100 or so chars) (I guess i can just do this in python??)
Then to search through all the sentences for specific instances within that sentence e.g. "David Cameron"
Then I would like to check for positive/negative sentiment in each sentence and count them accordingly

NB: I am not really worried too much about accuracy because my data sets are large and also not worried too much about sarcasm.

Here are the troubles I am having:

All the data sets I can find e.g. the corpus movie review data that comes with NLTK arent in webservice format. It looks like this has had some processing done already. As far as I can see the processing (by stanford) was done with WEKA. Is it not possible for NLTK to do all this on its own? Here all the data sets have already been organised into positive/negative already e.g. polarity dataset http://www.cs.cornell.edu/People/pabo/movie-review-data/ How is this done? (to organise the sentences by sentiment, is it definitely WEKA? or something else?)
I am not sure I understand why WEKA and NLTK would be used together. Seems like they do much the same thing. If im processing the data with WEKA first to find sentiment why would I need NLTK? Is it possible to explain why this might be necessary?

I have found a few scripts that get somewhat near this task, but all are using the same pre-processed data. Is it not possible to process this data myself to find sentiment in sentences rather than using the data samples given in the link?

Any help is much appreciated and will save me much hair!

Cheers Ke

686

asked May 14 '10 07:05

Ke.

1 Answers

The movie review data has already been marked by humans as being positive or negative (the person who made the review gave the movie a rating which is used to determine polarity). These gold standard labels allow you to train a classifier, which you could then use for other movie reviews. You could train a classifier in NLTK with that data, but applying the results to election tweets might be less accurate than randomly guessing positive or negative. Alternatively, you can go through and label a few thousand tweets yourself as positive or negative and use this as your training set.

For a description of using Naive Bayes for sentiment analysis with NLTK: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/

Then in that code, instead of using the movie corpus, use your own data to calculate word counts (in the word_feats method).

answered Oct 17 '22 06:10

ealdent

Related questions
                            
                                TypeError: can't pickle _thread.lock objects in Seq2Seq
                            
                                Effective 1-5 grams extraction with python
                            
                                Sentence compression using NLP [closed]
                            
                                Does NLTK have a tool for dependency parsing?
                            
                                What NLP tools to use to match phrases having similar meaning or semantics
                            
                                How to load sentences into Python gensim?
                            
                                Fast/Optimize N-gram implementations in python
                            
                                How does word2vec or skip-gram model convert words to vector?
                            
                                php sentence boundaries detection [duplicate]
                            
                                Stanford Core NLP - understanding coreference resolution
                            
                                Is wordnet path similarity commutative?
                            
                                NLP/Machine Learning text comparison [closed]
                            
                                What does a weighted word embedding mean?
                            
                                nltk language model (ngram) calculate the prob of a word from context
                            
                                Saving nltk drawn parse tree to image file
                            
                                What are co-occurence matrixes and how are they used in NLP?
                            
                                How to add attention layer to a Bi-LSTM
                            
                                How can I install torchtext?
                            
                                Why does word2Vec use cosine similarity?
                            
                                Speed up Spacy Named Entity Recognition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sentiment analysis with NLTK python for sentences using sample data or webservice?

Tags:

classification

nlp

nltk

weka

Ke.

People also ask

1 Answers

ealdent

Recent Activity

Donate For Us