Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ImportError: cannot import name PunktWordTokenizer

I was trying to use PunktWordTokenizer and it was occurred an error as below.

from nltk.tokenize.punkt import PunktWordTokenizer

And this gave the following error message.

Traceback (most recent call last): File "file", line 5, in <module>
from nltk.tokenize.punkt import PunktWordTokenizer ImportError: cannot import name PunktWordTokenizer

I've checked that nltk is installed and that PunkWordTokenzer is also installed using nltk.download(). Need some help for this.

like image 368
Hash Avatar asked May 29 '17 09:05

Hash


2 Answers

PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.

from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)

The difference is :

PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.

For example, given Input: This’s a test

PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]
like image 96
Vivek Puurkayastha Avatar answered Nov 04 '22 02:11

Vivek Puurkayastha


There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.

>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer

For solving this Try pip install -U nltk to upgrade your NLTK version.

like image 40
Shubham R Avatar answered Nov 04 '22 00:11

Shubham R