Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nltk.word_tokenize() giving AttributeError: 'module' object has no attribute 'defaultdict'

I am new to nltk. I was trying some basics.

import nltk
nltk.word_tokenize("Tokenize me")

gives me this following error

Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
nltk.word_tokenize("hi im no onee")
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 101, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 85, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Python27\lib\site-packages\nltk\data.py", line 786, in load
resource_val = pickle.load(opened_resource)
AttributeError: 'module' object has no attribute 'defaultdict'

Please someone help. Please tell me how to fix this error.

like image 635
Kantajit Avatar asked Jul 08 '15 14:07

Kantajit


People also ask

What is NLTK word_tokenize?

word_tokenize is a function in Python that splits a given sentence into words using the NLTK library. Figure 1 below shows the tokenization of sentence into words. Figure 1: Splitting of a sentence into words. In Python, we can tokenize with the help of the Natural Language Toolkit ( NLTK ) library.

How do you Tokenize a sentence using the NLTK package?

NLTK contains a module called tokenize() which further classifies into two sub-categories: Word tokenize: We use the word_tokenize() method to split a sentence into tokens or words. Sentence tokenize: We use the sent_tokenize() method to split a document or paragraph into sentences.

What does NLTK Punkt do?

tokenize. punkt module. This tokenizer divides a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences.


1 Answers

I just checked it on my system.

Fix:

>> import nltk
>> nltk.download('all')

Then everything worked fine.

>> import nltk
>> nltk.word_tokenize("Tokenize me")
['Tokenize', 'me']
like image 105
Manoj Avatar answered Sep 21 '22 23:09

Manoj