Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK Chunking and walking the results tree

I'm using NLTK RegexpParser to extract noungroups and verbgroups from tagged tokens.

How do I walk the resulting tree to find only the chunks that are NP or V groups?

from nltk.chunk import RegexpParser

grammar = '''
NP: {<DT>?<JJ>*<NN>*}
V: {<V.*>}'''
chunker = RegexpParser(grammar)
token = [] ## Some tokens from my POS tagger
chunked = chunker.parse(tokens)
print chunked

#How do I walk the tree?
#for chunk in chunked:
#    if chunk.??? == 'NP':
#         print chunk

(S (NP Carrier/NN) for/IN tissue-/JJ and/CC cell-culture/JJ for/IN (NP the/DT preparation/NN) of/IN (NP implants/NNS) and/CC (NP implant/NN) (V containing/VBG) (NP the/DT carrier/NN) ./.)

like image 700
Vincent Theeten Avatar asked Oct 01 '11 08:10

Vincent Theeten


People also ask

What is chunking in NLTK?

Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called “chunk parsing” or “chunking”, and the identified groups are called “chunks”. The chunked text is represented using a shallow tree called a “chunk structure.”

What is NLTK tree tree?

A Tree represents a hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree. A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.

What is chunking in NLP?

Chunking is defined as the process of natural language processing used to identify parts of speech and short phrases present in a given sentence.

What is NLTK RegexpParser?

RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString , and each rule acts by modifying the chunking in the ChunkString .


1 Answers

This should work:

for n in chunked:
    if isinstance(n, nltk.tree.Tree):               
        if n.label() == 'NP':
            do_something_with_subtree(n)
        else:
            do_something_with_leaf(n)
like image 65
Savino Sguera Avatar answered Sep 21 '22 08:09

Savino Sguera