Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using Dependency Parser in Stanford coreNLP

I am using the Stanford coreNLP ( http://nlp.stanford.edu/software/corenlp.shtml ) in order to parse sentences and extract dependencies between the words.

I have managed to create the dependencies graph like in the example in the supplied link, but I don't know how to work with it. I can print the entire graph using the toString() method, but the problem I have is that the methods that search for certain words in the graph, such as getChildList, require an IndexedWord object as a parameter. Now, it is clear why they do because the nodes of the graph are of IndexedWord type, but it's not clear to me how I create such an object in order to search for a specific node.

For example: I want to find the children of the node that represents the word "problem" in my sentence. How I create an IndexWord object that represents the word "problem" so I can search for it in the graph?

like image 821
Eddie Dovzhik Avatar asked Dec 22 '22 06:12

Eddie Dovzhik


1 Answers

In general, you shouldn't be creating your own IndexedWord objects. (These are used to represent "word tokens", i.e., particular words in a text, not "word types", and so asking for the word "problem" -- a word type -- isn't really valid; in particular, a sentence could have multiple tokens of this word type.)

There are a couple of convenience methods that let you do what you want:

  • sg.getNodeByWordPattern(String pattern)
  • sg.getAllNodesByWordPattern(String pattern)

The first is a little dangerous, since it just returns the first IndexedWord matching the pattern, or null if there are none. But it's most directly what you asked for.

Some other methods to start from are:

  • sg.getFirstRoot() to find the (first, usually only) root of the graph and then to navigate down from there, such as by using the sg.getChildren(root) method.
  • sg.vertexSet() to get all of the IndexWord objects in the graph.
  • sg.getNodeByIndex(int) if you already know the input sentence, and therefore can ask for words by their integer index.

Commonly these methods leave you iterating through nodes. Really, the first two get...Node... methods just do the iteration for you.

like image 194
Christopher Manning Avatar answered Jan 07 '23 12:01

Christopher Manning