I just started using a part-of-speech tagger, and I am facing many problems.
I started POS tagging with the following:
import nltk text=nltk.word_tokenize("We are going out.Just you and me.")
When I want to print 'text'
, the following happens:
print nltk.pos_tag(text) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "F:\Python26\lib\site-packages\nltk\tag\__init__.py", line 63, in pos_tag tagger = nltk.data.load(_POS_TAGGER) File "F:\Python26\lib\site-packages\nltk\data.py", line 594, in load resource_val = pickle.load(_open(resource_url)) File "F:\Python26\lib\site-packages\nltk\data.py", line 673, in _open return find(path).open() File "F:\Python26\lib\site-packages\nltk\data.py", line 455, in find raise LookupError(resource_not_found)` LookupError: Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download(). Searched in: - 'C:\\Documents and Settings\\Administrator/nltk_data' - 'C:\\nltk_data' - 'D:\\nltk_data' - 'E:\\nltk_data' - 'F:\\Python26\\nltk_data' - 'F:\\Python26\\lib\\nltk_data' - 'C:\\Documents and Settings\\Administrator\\Application Data\\nltk_data'
I used nltk.download()
but it did not work.
Parts of Speech (POS) Tagging. Parts of speech tagging simply refers to assigning parts of speech to individual words in a sentence, which means that, unlike phrase matching, which is performed at the sentence or multi-word level, parts of speech tagging is performed at the token level.
Part-of-speech (POS) tagging is a popular Natural Language Processing process which refers to categorizing words in a text (corpus) in correspondence with a particular part of speech, depending on the definition of the word and its context.
1.2 Limitations of Current POS Tagging System Limitation of this system is that if the word is not present in the corpus then it is tagged with unknown “UNK” tag. Hence, the accuracy of the system degrades with increase in number of unknown words.
From NLTK
versions higher than v3.2, please use:
>>> import nltk >>> nltk.__version__ '3.2.1' >>> nltk.download('averaged_perceptron_tagger') [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /home/alvas/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to-date! True
For NLTK
versions using the old MaxEnt model, i.e. v3.1 and below, please use:
>>> import nltk >>> nltk.download('maxent_treebank_pos_tagger') [nltk_data] Downloading package maxent_treebank_pos_tagger to [nltk_data] /home/alvas/nltk_data... [nltk_data] Package maxent_treebank_pos_tagger is already up-to-date! True
For more details on the change in the default pos_tag
, please see https://github.com/nltk/nltk/pull/1143
When you type nltk.download()
in Python, an NLTK Downloader interface gets displayed automatically.
Click on Models and choose maxent_treebank_pos_. It gets installed automatically.
import nltk text=nltk.word_tokenize("We are going out.Just you and me.") print nltk.pos_tag(text) [('We', 'PRP'), ('are', 'VBP'), ('going', 'VBG'), ('out.Just', 'JJ'), ('you', 'PRP'), ('and', 'CC'), ('me', 'PRP'), ('.', '.')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With