when I chunk text, I get lots of codes in the output like
NN, VBD, IN, DT, NNS, RB
.
Is there a list documented somewhere which tells me the meaning of these?
I have tried googling nltk chunk code
nltk chunk grammar
nltk chunk tokens
.
But I am not able to find any documentation which explains what these codes mean.
NN: Noun, singular or mass. NNS: Noun, plural. PP: Preposition Phrase.
NN (noun singular) and NNP (proper noun singular) are the tags for singular nouns and NNS (noun plural) and NNPS (proper noun plural) are the tags for plural nouns.
How does POS Tagging works? POS tagging is a supervised learning solution that uses features like the previous word, next word, is first letter capitalized etc. NLTK has a function to get pos tags and it works after tokenization process. The most popular tag set is Penn Treebank tagset.
The tags that you see are not a result of the chunks but the POS tagging that happens before chunking. It's the Penn Treebank tagset, see https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
>>> from nltk import word_tokenize, pos_tag, ne_chunk
>>> sent = "This is a Foo Bar sentence."
# POS tag.
>>> nltk.pos_tag(word_tokenize(sent))
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('Foo', 'NNP'), ('Bar', 'NNP'), ('sentence', 'NN'), ('.', '.')]
>>> tagged_sent = nltk.pos_tag(word_tokenize(sent))
# Chunk.
>>> ne_chunk(tagged_sent)
Tree('S', [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), Tree('ORGANIZATION', [('Foo', 'NNP'), ('Bar', 'NNP')]), ('sentence', 'NN'), ('.', '.')])
To get the chunks look for subtrees within the chunked outputs. From the above output, the Tree('ORGANIZATION', [('Foo', 'NNP'), ('Bar', 'NNP')])
indicates the chunk.
This tutorial site is pretty helpful to explain the chunking process in NLTK: http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/dongqing-chunking.pdf.
For official documentation, see http://www.nltk.org/howto/chunk.html
Even though the above links have all kinds. But hope this is still helpful for someone, added a few that are missed on other links.
CC: Coordinating conjunction
CD: Cardinal number
DT: Determiner
EX: Existential there
FW: Foreign word
IN: Preposition or subordinating conjunction
JJ: Adjective
VP: Verb Phrase
JJR: Adjective, comparative
JJS: Adjective, superlative
LS: List item marker
MD: Modal
NN: Noun, singular or mass
NNS: Noun, plural
PP: Preposition Phrase
NNP: Proper noun, singular Phrase
NNPS: Proper noun, plural
PDT: Pre determiner
POS: Possessive ending
PRP: Personal pronoun Phrase
PRP: Possessive pronoun Phrase
RB: Adverb
RBR: Adverb, comparative
RBS: Adverb, superlative
RP: Particle
S: Simple declarative clause
SBAR: Clause introduced by a (possibly empty) subordinating conjunction
SBARQ: Direct question introduced by a wh-word or a wh-phrase.
SINV: Inverted declarative sentence, i.e. one in which the subject follows the tensed verb or modal.
SQ: Inverted yes/no question, or main clause of a wh-question, following the wh-phrase in SBARQ.
SYM: Symbol
VBD: Verb, past tense
VBG: Verb, gerund or present participle
VBN: Verb, past participle
VBP: Verb, non-3rd person singular present
VBZ: Verb, 3rd person singular present
WDT: Wh-determiner
WP: Wh-pronoun
WP: Possessive wh-pronoun
WRB: Wh-adverb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With