Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coreference Resolution using OpenNLP

Tags:

nlp

opennlp

I want to do "coreference resolution" using OpenNLP. Documentation from Apache (Coreference Resolution) doesn't cover how to do "coreference resolution". Does anybody have any docs/tutorial how to do this?

like image 900
Khairul Avatar asked Dec 25 '11 13:12

Khairul


2 Answers

I recently ran into the same problem and wrote up some blog notes for using OpenNLP 1.5.x tools. It's a bit dense to copy in its entirety, so here's a link with more details.


At a high level, you need to load the appropriate OpenNLP coreference model libraries and also the WordNet 3.0 dictionary. Given those dependencies, initializing the linker object is pretty straightforward:

// LinkerMode should be TEST
//Note: I tried LinkerMode.EVAL before realizing that this was the problem
Linker _linker = new DefaultLinker("lib/opennlp/coref", LinkerMode.TEST);

Using the Linker, however, is a bit less obvious. You need to:

  1. Break the content down into sentences and the corresponding tokens
  2. Create a Parse object for each sentence
  3. Wrap each sentence Parse so as to indicate the sentence ordering:

    final DefaultParse parseWrapper = new DefaultParse(parse, idx);
  4. Iterate over each sentence parse ane use the Linker to get the Mention objects from each parse:

    final Mention[] extents =
       _linker.getMentionFinder().getMentions(parseWrapper);
  5. Finally, use the Linker to identify the distinct entities across all of the Mention objects:

    DiscourseEntity[] entities = _linker.getEntities(arrayOfAllMentions);
like image 51
dpdearing Avatar answered Nov 01 '22 15:11

dpdearing


There is little coreference resolution documentation for OpenNLP at the moment except for a very short mention of how to run it in the readme.

If you're not invested in using OpenNLP, then consider the Stanford CoreNLP package, which includes a Java example of how to run it, including how to perform coreference resolution using the package. It also includes a page summarizing it's performance, and the papers published on the coreference package.

like image 27
Chris Avatar answered Nov 01 '22 16:11

Chris