nltk NaiveBayesClassifier training for sentiment analysis

Tags:

I am training the NaiveBayesClassifier in Python using sentences, and it gives me the error below. I do not understand what the error might be, and any help would be good.

I have tried many other input formats, but the error remains. The code given below:

from text.classifiers import NaiveBayesClassifier from text.blob import TextBlob train = [('I love this sandwich.', 'pos'),          ('This is an amazing place!', 'pos'),          ('I feel very good about these beers.', 'pos'),          ('This is my best work.', 'pos'),          ("What an awesome view", 'pos'),          ('I do not like this restaurant', 'neg'),          ('I am tired of this stuff.', 'neg'),          ("I can't deal with this", 'neg'),          ('He is my sworn enemy!', 'neg'),          ('My boss is horrible.', 'neg') ]  test = [('The beer was good.', 'pos'),         ('I do not enjoy my job', 'neg'),         ("I ain't feeling dandy today.", 'neg'),         ("I feel amazing!", 'pos'),         ('Gary is a friend of mine.', 'pos'),         ("I can't believe I'm doing this.", 'neg') ] classifier = nltk.NaiveBayesClassifier.train(train)

I am including the traceback below.

Traceback (most recent call last):   File "C:\Users\5460\Desktop\train01.py", line 15, in <module>     all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0]))   File "C:\Users\5460\Desktop\train01.py", line 15, in <genexpr>     all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0]))   File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 87, in word_tokenize     return _word_tokenize(text)   File "C:\Python27\lib\site-packages\nltk\tokenize\treebank.py", line 67, in tokenize     text = re.sub(r'^\"', r'``', text)   File "C:\Python27\lib\re.py", line 151, in sub     return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer

353

asked Dec 29 '13 17:12

student001

1 Answers

You need to change your data structure. Here is your train list as it currently stands:

>>> train = [('I love this sandwich.', 'pos'), ('This is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('This is my best work.', 'pos'), ("What an awesome view", 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('He is my sworn enemy!', 'neg'), ('My boss is horrible.', 'neg')]

The problem is, though, that the first element of each tuple should be a dictionary of features. So I will change your list into a data structure that the classifier can work with:

>>> from nltk.tokenize import word_tokenize # or use some other tokenizer >>> all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0])) >>> t = [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in train]

Your data should now be structured like this:

>>> t [({'this': True, 'love': True, 'deal': False, 'tired': False, 'feel': False, 'is': False, 'am': False, 'an': False, 'sandwich': True, 'ca': False, 'best': False, '!': False, 'what': False, '.': True, 'amazing': False, 'horrible': False, 'sworn': False, 'awesome': False, 'do': False, 'good': False, 'very': False, 'boss': False, 'beers': False, 'not': False, 'with': False, 'he': False, 'enemy': False, 'about': False, 'like': False, 'restaurant': False, 'these': False, 'of': False, 'work': False, "n't": False, 'i': False, 'stuff': False, 'place': False, 'my': False, 'view': False}, 'pos'), . . .]

Note that the first element of each tuple is now a dictionary. Now that your data is in place and the first element of each tuple is a dictionary, you can train the classifier like so:

>>> import nltk >>> classifier = nltk.NaiveBayesClassifier.train(t) >>> classifier.show_most_informative_features() Most Informative Features                     this = True              neg : pos    =      2.3 : 1.0                     this = False             pos : neg    =      1.8 : 1.0                       an = False             neg : pos    =      1.6 : 1.0                        . = True              pos : neg    =      1.4 : 1.0                        . = False             neg : pos    =      1.4 : 1.0                  awesome = False             neg : pos    =      1.2 : 1.0                       of = False             pos : neg    =      1.2 : 1.0                     feel = False             neg : pos    =      1.2 : 1.0                    place = False             neg : pos    =      1.2 : 1.0                 horrible = False             pos : neg    =      1.2 : 1.0

If you want to use the classifier, you can do it like this. First, you begin with a test sentence:

>>> test_sentence = "This is the best band I've ever heard!"

Then, you tokenize the sentence and figure out which words the sentence shares with all_words. These constitute the sentence's features.

>>> test_sent_features = {word: (word in word_tokenize(test_sentence.lower())) for word in all_words}

Your features will now look like this:

>>> test_sent_features {'love': False, 'deal': False, 'tired': False, 'feel': False, 'is': True, 'am': False, 'an': False, 'sandwich': False, 'ca': False, 'best': True, '!': True, 'what': False, 'i': True, '.': False, 'amazing': False, 'horrible': False, 'sworn': False, 'awesome': False, 'do': False, 'good': False, 'very': False, 'boss': False, 'beers': False, 'not': False, 'with': False, 'he': False, 'enemy': False, 'about': False, 'like': False, 'restaurant': False, 'this': True, 'of': False, 'work': False, "n't": False, 'these': False, 'stuff': False, 'place': False, 'my': False, 'view': False}

Then you simply classify those features:

>>> classifier.classify(test_sent_features) 'pos' # note 'best' == True in the sentence features above

This test sentence appears to be positive.

182

answered Oct 10 '22 13:10

Justin O Barber

Related questions
                            
                                How do you wrap Laravel Eloquent ORM query scopes in parentheses when chaining?
                            
                                Tail -f + grep? [duplicate]
                            
                                Git GUI Crash Signal 6
                            
                                Nginx config file overwritten during Elastic Beanstalk deployment?
                            
                                Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
                            
                                'System.Web.HttpContextBase' does not contain a definition for 'GetOwinContext'
                            
                                How do I find the last occurrence of a substring in a Swift string?
                            
                                Android studio not rendering layout preview
                            
                                Android appcompat toolbar stretches when searchview gets focus
                            
                                Rails reset ALL Postgres sequences?
                            
                                How to get the current user ID in CloudKit?
                            
                                Preserve order of dictionary items as declared in Swift?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With