generally A head of a nounphrase is a noun which is rightmost of the NP as shown below tree is the head of the parent NP. So
ROOT | S ___|________________________ NP | ___|_____________ | | PP VP | ____|____ ____|___ NP | NP | PRT ___|_______ | | | | DT JJ NN NN IN NNP VBD RP | | | | | | | | The old oak tree from India fell down
Out[40]: Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', ['old']), Tree('NN', ['oak']), Tree('NN', ['tree'])]), Tree('PP', [Tree('IN', ['from']), Tree('NP', [Tree('NNP', ['India'])])])]), Tree('VP', [Tree('VBD', ['fell']), Tree('PRT', [Tree('RP', ['down'])])])])
The following code based on a java implementation uses a simplistic rule to find the head of the NP , but i need to be based on the rules:
parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
def traverse(t):
try:
t.label()
except AttributeError:
return
else:
if t.label()=='NP':
print 'NP:'+str(t.leaves())
print 'NPhead:'+str(t.leaves()[-1])
for child in t:
traverse(child)
else:
for child in t:
traverse(child)
tree=Tree.fromstring(parsestr)
traverse(tree)
The above code gives output:
NP:['The', 'old', 'oak', 'tree', 'from', 'India'] NPhead:India NP:['The', 'old', 'oak', 'tree'] NPhead:tree NP:['India'] NPhead:India
Although now its giving correct output for the sentence given but I need to incorporate a condition that only right most noun is extracted as head , currently it does not check if it were a noun (NN)
print 'NPhead:'+str(t.leaves()[-1])
So something like following in the np head condition in above code:
t.leaves().getrightmostnoun()
Michael Collins dissertation (Appendix A) includes head-finding rules for the Penn Treebank, and hence it is not necessary that only the rightmost noun is the head. Hence the above conditions should incorporate such scenario.
For the following example as given in one of the answers:
(NP (NP the person) that gave (NP the talk)) went home
The head noun of the subject is person but the last leave node of the NP the person that gave the talk is talk.
NLTK Parsers. Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text's “parses”.
Noun Phrase ExtractionThe form of n-gram that takes center stage in NLP context analysis is the noun phrase. Noun phrases are part of speech patterns that include a noun. They can also include whatever other parts of speech make grammatical sense, and can include multiple nouns.
Noun chunks are “base noun phrases” – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, “the lavish green grass” or “the world's largest tech fund”. To get the noun chunks in a document, simply iterate over Doc.
There are built-in string to Tree
object in NLTK (http://www.nltk.org/_modules/nltk/tree.html), see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541.
>>> from nltk.tree import Tree
>>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
>>> for i in Tree.fromstring(parsestr).subtrees():
... if i.label() == 'NP':
... print i
...
(NP
(NP (DT The) (JJ old) (NN oak) (NN tree))
(PP (IN from) (NP (NNP India))))
(NP (DT The) (JJ old) (NN oak) (NN tree))
(NP (NNP India))
>>> for i in Tree.fromstring(parsestr).subtrees():
... if i.label() == 'NP':
... print i.leaves()
...
['The', 'old', 'oak', 'tree', 'from', 'India']
['The', 'old', 'oak', 'tree']
['India']
Note that it's not always the case that right most noun is the head noun of an NP, e.g.
>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
... if i.label() == 'NP':
... print i.leaves()[-1]
...
Magnificent
talk
Arguably, Magnificent
can still be the head noun. Another example is when the NP includes a relative clause:
(NP (NP the person) that gave (NP the talk)) went home
The head noun of the subject is person
but the last leave node of the NP the person that gave the talk
is talk
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With