Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is NLTK POS tagger asking me to download?

I just started using a part-of-speech tagger, and I am facing many problems.

I started POS tagging with the following:

import nltk text=nltk.word_tokenize("We are going out.Just you and me.") 

When I want to print 'text', the following happens:

print nltk.pos_tag(text) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "F:\Python26\lib\site-packages\nltk\tag\__init__.py", line 63, in pos_tag tagger = nltk.data.load(_POS_TAGGER) File "F:\Python26\lib\site-packages\nltk\data.py", line 594, in load resource_val = pickle.load(_open(resource_url)) File "F:\Python26\lib\site-packages\nltk\data.py", line 673, in _open  return find(path).open()  File "F:\Python26\lib\site-packages\nltk\data.py", line 455, in find    raise LookupError(resource_not_found)`   LookupError:  Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not  found.  Please use the NLTK Downloader to obtain the resource:  >>> nltk.download().   Searched in:     - 'C:\\Documents and Settings\\Administrator/nltk_data'     - 'C:\\nltk_data'     - 'D:\\nltk_data'     - 'E:\\nltk_data'     - 'F:\\Python26\\nltk_data'     - 'F:\\Python26\\lib\\nltk_data'     - 'C:\\Documents and Settings\\Administrator\\Application Data\\nltk_data' 

I used nltk.download() but it did not work.

like image 970
Pearl Avatar asked Dec 21 '11 13:12

Pearl


People also ask

What is POS Tagging in Python?

Parts of Speech (POS) Tagging. Parts of speech tagging simply refers to assigning parts of speech to individual words in a sentence, which means that, unlike phrase matching, which is performed at the sentence or multi-word level, parts of speech tagging is performed at the token level.

What is POS tagging used for?

Part-of-speech (POS) tagging is a popular Natural Language Processing process which refers to categorizing words in a text (corpus) in correspondence with a particular part of speech, depending on the definition of the word and its context.

What is the POS tag for unknown?

1.2 Limitations of Current POS Tagging System Limitation of this system is that if the word is not present in the corpus then it is tagged with unknown “UNK” tag. Hence, the accuracy of the system degrades with increase in number of unknown words.


2 Answers

From NLTK versions higher than v3.2, please use:

>>> import nltk >>> nltk.__version__ '3.2.1' >>> nltk.download('averaged_perceptron_tagger') [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data]     /home/alvas/nltk_data... [nltk_data]   Package averaged_perceptron_tagger is already up-to-date! True 

For NLTK versions using the old MaxEnt model, i.e. v3.1 and below, please use:

>>> import nltk >>> nltk.download('maxent_treebank_pos_tagger') [nltk_data] Downloading package maxent_treebank_pos_tagger to [nltk_data]     /home/alvas/nltk_data... [nltk_data]   Package maxent_treebank_pos_tagger is already up-to-date! True 

For more details on the change in the default pos_tag, please see https://github.com/nltk/nltk/pull/1143

like image 67
alvas Avatar answered Sep 27 '22 01:09

alvas


When you type nltk.download() in Python, an NLTK Downloader interface gets displayed automatically.
Click on Models and choose maxent_treebank_pos_. It gets installed automatically.

import nltk  text=nltk.word_tokenize("We are going out.Just you and me.") print nltk.pos_tag(text) [('We', 'PRP'), ('are', 'VBP'), ('going', 'VBG'), ('out.Just', 'JJ'),  ('you', 'PRP'), ('and', 'CC'), ('me', 'PRP'), ('.', '.')] 
like image 44
Pearl Avatar answered Sep 23 '22 01:09

Pearl