In the following code, why does nltk think 'fish' is an adjective and not a noun?
>>> import nltk
>>> s = "a woman needs a man like a fish needs a bicycle"
>>> nltk.pos_tag(s.split())
[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man', 'NN'), ('like', 'IN'), ('a', 'DT'), ('fish', 'JJ'), ('needs', 'NNS'), ('a', 'DT'), ('bicycle', 'NN')]
If you used a Lookup Tagger as described in the NLTK book, chapter 5 (for example using WordNet as lookup reference) first, your tagger would already "know" that fish cannot be an adjective. For all words with several possible POS Tags you could then use a statistical tagger as a backoff tagger.
I am not sure what is the workaround but you can check the source here https://nltk.googlecode.com/svn/trunk/nltk/nltk/tag/
Meanwhile I tried your sentence with little different approach.
>>> s = "a woman needs a man. A fish needs a bicycle"
>>> nltk.pos_tag(s.split())
[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man.', NP'), ('A','NNP'), ('fish', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('bicycle', 'NN')]
which resulted in fish as "NN".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With