I am new to nltk. I was trying some basics.
import nltk
nltk.word_tokenize("Tokenize me")
gives me this following error
Traceback (most recent call last):
File "<pyshell#27>", line 1, in <module>
nltk.word_tokenize("hi im no onee")
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 101, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 85, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Python27\lib\site-packages\nltk\data.py", line 786, in load
resource_val = pickle.load(opened_resource)
AttributeError: 'module' object has no attribute 'defaultdict'
Please someone help. Please tell me how to fix this error.
word_tokenize is a function in Python that splits a given sentence into words using the NLTK library. Figure 1 below shows the tokenization of sentence into words. Figure 1: Splitting of a sentence into words. In Python, we can tokenize with the help of the Natural Language Toolkit ( NLTK ) library.
NLTK contains a module called tokenize() which further classifies into two sub-categories: Word tokenize: We use the word_tokenize() method to split a sentence into tokens or words. Sentence tokenize: We use the sent_tokenize() method to split a document or paragraph into sentences.
tokenize. punkt module. This tokenizer divides a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences.
I just checked it on my system.
Fix:
>> import nltk
>> nltk.download('all')
Then everything worked fine.
>> import nltk
>> nltk.word_tokenize("Tokenize me")
['Tokenize', 'me']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With