I was trying to use PunktWordTokenizer and it was occurred an error as below.
from nltk.tokenize.punkt import PunktWordTokenizer
And this gave the following error message.
Traceback (most recent call last): File "file", line 5, in <module>
from nltk.tokenize.punkt import PunktWordTokenizer ImportError: cannot import name PunktWordTokenizer
I've checked that nltk is installed and that PunkWordTokenzer is also installed using nltk.download(). Need some help for this.
PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.
from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)
The difference is :
PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.
For example, given Input: This’s a test
PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]
There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.
>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer
For solving this Try pip install -U nltk
to upgrade your NLTK version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With