Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

POS tagging - NLTK thinks noun is adjective

Tags:

python

nltk

In the following code, why does nltk think 'fish' is an adjective and not a noun?

>>> import nltk
>>> s = "a woman needs a man like a fish needs a bicycle"
>>> nltk.pos_tag(s.split())
[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man', 'NN'), ('like', 'IN'), ('a', 'DT'), ('fish', 'JJ'), ('needs', 'NNS'), ('a', 'DT'), ('bicycle', 'NN')]
like image 700
waigani Avatar asked Nov 23 '12 13:11

waigani


2 Answers

If you used a Lookup Tagger as described in the NLTK book, chapter 5 (for example using WordNet as lookup reference) first, your tagger would already "know" that fish cannot be an adjective. For all words with several possible POS Tags you could then use a statistical tagger as a backoff tagger.

like image 43
Suzana Avatar answered Sep 27 '22 21:09

Suzana


I am not sure what is the workaround but you can check the source here https://nltk.googlecode.com/svn/trunk/nltk/nltk/tag/

Meanwhile I tried your sentence with little different approach.

>>> s = "a woman needs a man. A fish needs a bicycle"
>>> nltk.pos_tag(s.split())
[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man.', NP'), ('A','NNP'),   ('fish', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('bicycle', 'NN')]

which resulted in fish as "NN".

like image 107
Chandan Gupta Avatar answered Sep 27 '22 22:09

Chandan Gupta