Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get dependency parse output exactly as online demo?

How can I programmatically get the same dependency parse using stanford corenlp as seen in the online demo?

I am using the corenlp package to obtain the dependency parse for the following sentence.

Second healthcare worker in Texas tests positive for Ebola , authorities say .

I try to obtain the parse programmatically using the code below

            Properties props = new Properties();
            props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
            StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

            String text = "Second healthcare worker in Texas tests positive for Ebola , authorities say ."; // Add your text here!
            Annotation document = new Annotation(text);
            pipeline.annotate(document);
            String[] myStringArray = {"SentencesAnnotation"};
            List<CoreMap> sentences = document.get(SentencesAnnotation.class);
            for(CoreMap sentence: sentences) {
                SemanticGraph dependencies = sentence.get(BasicDependenciesAnnotation.class);
                IndexedWord root = dependencies.getFirstRoot();
                System.out.printf("root(ROOT-0, %s-%d)%n", root.word(), root.index());
                for (SemanticGraphEdge e : dependencies.edgeIterable()) {
                    System.out.printf ("%s(%s-%d, %s-%d)%n", e.getRelation().toString(), e.getGovernor().word(), e.getGovernor().index(), e.getDependent().word(), e.getDependent().index());
                }
            }

    }

I get the following output using the stanford corenlp 3.5.0 package.

root(ROOT-0, worker-3)
amod(worker-3, Second-1)
nn(worker-3, healthcare-2)
prep(worker-3, in-4)
amod(worker-3, positive-7)
dep(worker-3, say-12)
pobj(in-4, tests-6)
nn(tests-6, Texas-5)
prep(positive-7, for-8)
pobj(for-8, ebola-9)
nsubj(say-12, authorities-11)

But the online demo gives a different answer that marks say as the root and has other relationships like ccomp between words in the parse.

amod(worker-3, Second-1)
nn(worker-3, healthcare-2)
nsubj(tests-6, worker-3)
prep(worker-3, in-4)
pobj(in-4, Texas-5)
ccomp(say-12, tests-6)
acomp(tests-6, positive-7)
prep(positive-7, for-8)
pobj(for-8, Ebola-9)
nsubj(say-12, authorities-11)
root(ROOT-0, say-12)

How can I resolve my output to match with the online demo?

like image 259
Sandeep Soni Avatar asked Jan 09 '15 17:01

Sandeep Soni


1 Answers

The reason for the different output is that if you use the parser demo, the stand-alone parser distribution is being used and your code uses the entire CoreNLP distribution. While both of them use the same parser and the same models, the default configuration of CoreNLP runs a part-of-speech (POS) tagger before running the parser and the parser incorporates the POS information which can lead to different results in some cases.

In order to get the same results you can disable the POS tagger by changing the list of annotators:

props.put("annotators", "tokenize, ssplit, parse, lemma, ner, dcoref");

Note, however, that the lemma, ner and dcoref annotators all require POS tags, so you have to change the order of the annotators.

There is also a CoreNLP demo which should always produce the same output as your code.

like image 59
Sebastian Schuster Avatar answered Nov 06 '22 18:11

Sebastian Schuster