I have used this code:
# Step 1 : TOKENIZE
from nltk.tokenize import *
words = word_tokenize(text)
# Step 2 : POS DISAMBIG
from nltk.tag import *
tags = pos_tag(words)
to tag two sentences: John is very nice. Is John very nice?
John in the first sentence was NN while in the second was VB! So, how can we correct pos_tag function without training back-off taggers?
Modified question:
I have seen the demonstration of NLTK taggers here http://text-processing.com/demo/tag/. When I tried the option "English Taggers & Chunckers: Treebank" or "Brown Tagger", I get the correct tags. So how to use Brown Tagger for example without training it?
Summary. POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.
The main problem with POS tagging is ambiguity. In English, many common words have multiple meanings and therefore multiple POS . The job of a POS tagger is to resolve this ambiguity accurately based on the context of use. For example, the word "shot" can be a noun or a verb.
JJ adjective 'big' JJR adjective, comparative 'bigger' JJS adjective, superlative 'biggest'
Short answer: you can't. Slightly longer answer: you can override specific words using a manually created UnigramTagger. See my answer for custom tagging with nltk for details on this method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With