I am trying to use speech tagging in NLTK and have used this command:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
nltk.pos_tag(text)
File "C:\Python27\lib\site-packages\nltk\tag\__init__.py", line 99, in pos_tag
tagger = load(_POS_TAGGER)
File "C:\Python27\lib\site-packages\nltk\data.py", line 605, in load
resource_val = pickle.load(_open(resource_url))
File "C:\Python27\lib\site-packages\nltk\data.py", line 686, in _open
return find(path).open()
File "C:\Python27\lib\site-packages\nltk\data.py", line 467, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
found. Please use the NLTK Downloader to obtain the resource:
However, I get an error message which shows:
engish.pickle not found.
I have download the whole corpora and the english.pickle file is there in the maxtent_treebank_pos_tagger
What can I do to get this to work?
What are POS tags used for? POS tags make it possible for automatic text processing tools to take into account which part of speech each word is. This facilitates the use of linguistic criteria in addition to statistics.
In Natural Language Processing (NLP), POS is an essential building block of language models and interpreting text. While POS tags are used in higher-level functions of NLP, it's important to understand them on their own, and it's possible to leverage them for useful purposes in your text analysis.
What is Part-of-speech (POS) tagging ? It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.
Parts of Speech (POS) Tagging. Parts of speech tagging simply refers to assigning parts of speech to individual words in a sentence, which means that, unlike phrase matching, which is performed at the sentence or multi-word level, parts of speech tagging is performed at the token level.
Your Python installation is not able to reach maxent or treemap.
First, check if the tagger is indeed there: Start Python from the command line.
>>> import nltk
Then you can check using
>>> dir (nltk)
Look through the list to see if maxent
and treebank
are both there.
Easier would be to type
>>> "maxent" in dir(nltk)
>>> True
>>> "treebank" in dir(nltk)
>>> True
Use nltk.download()
--> Models tab and check to see if the treemap tagger shows as installed.
You should also try downloading the tagger again.
If you don't want to use the downloader gui, you can just use the following commands in a python or ipython shell:
import nltk
nltk.download('punkt')
nltk.download('maxent_treebank_pos_tagger')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With