Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read constituency based parse tree

I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:

E.g.:

          (ROOT
          (FRAG
          (NP (NN sent28))
          (: :)
          (S
          (NP (NNP Rome))
          (VP (VBZ is)
          (PP (IN in)
          (NP
          (NP (NNP Lazio) (NN province))
          (CC and)
          (NP
          (NP (NNP Naples))
          (PP (IN in)
          (NP (NNP Campania))))))))
          (. .)))

The original sentence is:

sent28: Rome is in Lazio province and Naples in Campania .

How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.

like image 356
Cheshie Avatar asked Feb 23 '15 13:02

Cheshie


People also ask

What is constituency parse tree?

The constituency parse tree is based on the formalism of context-free grammars. In this type of tree, the sentence is divided into constituents, that is, sub-phrases that belong to a specific category in the grammar.

What is constituency parse?

Constituency Parsing is the process of analyzing the sentences by breaking down it into sub-phrases also known as constituents. These sub-phrases belong to a specific category of grammar like NP (noun phrase) and VP(verb phrase).

What is the difference between dependency parsing and constituency parsing?

Dependency parsing displays only relationships between words and their constitutes while constituency parsing displays the entire sentence structure and relationships. Often dependency parsing is praised for being concise yet informative, but constituency parsing is often easier to read and understand.


1 Answers

NLTK has a class for reading parse trees: nltk.tree.Tree. The relevant method is called fromstring. You can then iterate its subtrees, leaves, etc...

As an aside: you might want to remove the bit that says sent28: as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.

like image 135
mbatchkarov Avatar answered Oct 17 '22 13:10

mbatchkarov