I'm using NLTK RegexpParser to extract noungroups and verbgroups from tagged tokens.
How do I walk the resulting tree to find only the chunks that are NP or V groups?
from nltk.chunk import RegexpParser
grammar = '''
NP: {<DT>?<JJ>*<NN>*}
V: {<V.*>}'''
chunker = RegexpParser(grammar)
token = [] ## Some tokens from my POS tagger
chunked = chunker.parse(tokens)
print chunked
#How do I walk the tree?
#for chunk in chunked:
# if chunk.??? == 'NP':
# print chunk
(S (NP Carrier/NN) for/IN tissue-/JJ and/CC cell-culture/JJ for/IN (NP the/DT preparation/NN) of/IN (NP implants/NNS) and/CC (NP implant/NN) (V containing/VBG) (NP the/DT carrier/NN) ./.)
Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called “chunk parsing” or “chunking”, and the identified groups are called “chunks”. The chunked text is represented using a shallow tree called a “chunk structure.”
A Tree represents a hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree. A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.
Chunking is defined as the process of natural language processing used to identify parts of speech and short phrases present in a given sentence.
RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString , and each rule acts by modifying the chunking in the ChunkString .
This should work:
for n in chunked:
if isinstance(n, nltk.tree.Tree):
if n.label() == 'NP':
do_something_with_subtree(n)
else:
do_something_with_leaf(n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With