How can I programmatically get the same dependency parse using stanford corenlp as seen in the online demo?
I am using the corenlp package to obtain the dependency parse for the following sentence.
Second healthcare worker in Texas tests positive for Ebola , authorities say .
I try to obtain the parse programmatically using the code below
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "Second healthcare worker in Texas tests positive for Ebola , authorities say ."; // Add your text here!
Annotation document = new Annotation(text);
pipeline.annotate(document);
String[] myStringArray = {"SentencesAnnotation"};
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
SemanticGraph dependencies = sentence.get(BasicDependenciesAnnotation.class);
IndexedWord root = dependencies.getFirstRoot();
System.out.printf("root(ROOT-0, %s-%d)%n", root.word(), root.index());
for (SemanticGraphEdge e : dependencies.edgeIterable()) {
System.out.printf ("%s(%s-%d, %s-%d)%n", e.getRelation().toString(), e.getGovernor().word(), e.getGovernor().index(), e.getDependent().word(), e.getDependent().index());
}
}
}
I get the following output using the stanford corenlp 3.5.0 package.
root(ROOT-0, worker-3)
amod(worker-3, Second-1)
nn(worker-3, healthcare-2)
prep(worker-3, in-4)
amod(worker-3, positive-7)
dep(worker-3, say-12)
pobj(in-4, tests-6)
nn(tests-6, Texas-5)
prep(positive-7, for-8)
pobj(for-8, ebola-9)
nsubj(say-12, authorities-11)
But the online demo gives a different answer that marks say as the root and has other relationships like ccomp between words in the parse.
amod(worker-3, Second-1)
nn(worker-3, healthcare-2)
nsubj(tests-6, worker-3)
prep(worker-3, in-4)
pobj(in-4, Texas-5)
ccomp(say-12, tests-6)
acomp(tests-6, positive-7)
prep(positive-7, for-8)
pobj(for-8, Ebola-9)
nsubj(say-12, authorities-11)
root(ROOT-0, say-12)
How can I resolve my output to match with the online demo?
The reason for the different output is that if you use the parser demo, the stand-alone parser distribution is being used and your code uses the entire CoreNLP distribution. While both of them use the same parser and the same models, the default configuration of CoreNLP runs a part-of-speech (POS) tagger before running the parser and the parser incorporates the POS information which can lead to different results in some cases.
In order to get the same results you can disable the POS tagger by changing the list of annotators:
props.put("annotators", "tokenize, ssplit, parse, lemma, ner, dcoref");
Note, however, that the lemma, ner and dcoref annotators all require POS tags, so you have to change the order of the annotators.
There is also a CoreNLP demo which should always produce the same output as your code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With