Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pos_tag in NLTK does not tag sentences correctly

Tags:

nltk

I have used this code:

# Step 1 : TOKENIZE
from nltk.tokenize import *
words = word_tokenize(text)

# Step 2 : POS DISAMBIG
from nltk.tag import *
tags = pos_tag(words)

to tag two sentences: John is very nice. Is John very nice?

John in the first sentence was NN while in the second was VB! So, how can we correct pos_tag function without training back-off taggers?

Modified question:

I have seen the demonstration of NLTK taggers here http://text-processing.com/demo/tag/. When I tried the option "English Taggers & Chunckers: Treebank" or "Brown Tagger", I get the correct tags. So how to use Brown Tagger for example without training it?

like image 485
user842457 Avatar asked Dec 03 '11 04:12

user842457


People also ask

What does NLTK Pos_tag do?

Summary. POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.

Which of the below are the issues in POS tagging?

The main problem with POS tagging is ambiguity. In English, many common words have multiple meanings and therefore multiple POS . The job of a POS tagger is to resolve this ambiguity accurately based on the context of use. For example, the word "shot" can be a noun or a verb.

What is JJ in POS tagging?

JJ adjective 'big' JJR adjective, comparative 'bigger' JJS adjective, superlative 'biggest'


1 Answers

Short answer: you can't. Slightly longer answer: you can override specific words using a manually created UnigramTagger. See my answer for custom tagging with nltk for details on this method.

like image 78
Jacob Avatar answered Sep 22 '22 02:09

Jacob



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!