I am going to use Stanford Corenlp 2013 to find phrase heads. I saw this thread.
But, the answer was not clear to me and I couldn't add any comment to continue that thread. So, I'm sorry for duplication.
What I have at the moment is the parse tree of a sentence (using Stanford Corenlp) (I also tried with CONLL format which is created by Stanford Corenlp). And what I need is exactly the head of noun phrases.
I don't know how I can use dependencies and the parse tree to extract heads of nounphrases.
What I know is that if I have nsubj (x, y)
, y is the head of the subject. If I have dobj(x,y)
, y is the head of the direct object. f I have iobj(x,y)
, y is the head of the indirect object.
However, I am not sure if this way is the correct way to find all phrase heads. If it is, which rules I should add to get all heads of noun phrases?
Maybe, it is worth saying that I need the heads of noun phrases in a java code.
The Stanford Parser distribution includes English tokenization, but does not provide tokenization used for French, German, and Spanish. Access to that tokenization requires using the full CoreNLP package. Likewise usage of the part-of-speech tagging models requires the license for the Stanford POS tagger or full CoreNLP distribution.
The Stanford Parser can be used to generate constituency and dependency parses of sentences for a variety of languages. The package includes PCFG, Shift Reduce, and Neural Dependency parsers.
CoreNLP, created by Stanford NLP Group, provides NLP tools in Java. This Java library can be used with NLTK to parse dependencies in Python. The first step is to download the Stanford CoreNLP zip file and Stanford CoreNLP model jar file from the CoreNLP website.
The best general syntax parser that exists for English, Arabic, Chinese, French, German, and Spanish is currently the blackbox parser found in Stanford’s CoreNLP library. This parser is a Java library, however, and requires Java 1.8 to be installed.
Since I couldnt comment on the answer given by Chaitanya, adding more to his answer here.
Stanford CoreNLP suite has implementation of Collins head finder heuristics and a semantic head finder heuristic in the form of
All you would need is instantiate one of the three and do the following.
Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
headFinder.determineHead(tree).pennPrint(out);
You can iterate through the nodes of the tree and determine head words wherever required.
PS: My answer is based on the StanfordCoreNLP suite released as of 20140104.
Here is a simple dfs that lets you extract head words for all noun phrases in a sentence
public static void dfs(Tree node, Tree parent, HeadFinder headFinder) {
if (node == null || node.isLeaf()) {
return;
}
//if node is a NP - Get the terminal nodes to get the words in the NP
if(node.value().equals("NP") ) {
System.out.println(" Noun Phrase is ");
List<Tree> leaves = node.getLeaves();
for(Tree leaf : leaves) {
System.out.print(leaf.toString()+" ");
}
System.out.println();
System.out.println(" Head string is ");
System.out.println(node.headTerminal(headFinder, parent));
}
for(Tree child : node.children()) {
dfs(child, node, headFinder);
}
}
You could extract the phrase of interest such that it is an object of the class Tree You can then use determineHead(Tree t) method from any of the classes that implement the interface HeadFinder.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With