Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenNLP vs Stanford CoreNLP

I've been doing a little comparison of these two packages and am not sure which direction to go in. What I am looking for briefly is:

  1. Named Entity Recognition (people, places, organizations and such).
  2. Gender identification.
  3. A decent training API.

From what I can tell, OpenNLP and Stanford CoreNLP expose pretty similar capabilities. However, Stanford CoreNLP looks like it has a lot more activity whereas OpenNLP has only had a few commits in the last six months.

Based on what I saw, OpenNLP appears to be easier to train new models and might be more attractive for that reason alone. However, my question is what would others start with as the basis for adding NLP features to a Java app? I'm mostly worried as to whether OpenNLP is "just mature" versus semi-abandoned.

like image 458
Mike Thomsen Avatar asked Oct 13 '16 16:10

Mike Thomsen


People also ask

What is difference between OpenNLP and NLTK?

A small difference is that OpenNLP is Java whereas NLTK is Python. So your preference can come into play. Another difference is that NLTK has build in methods for downloading corpora. If you were a little more specific about what you wanted, people could give you better advice.


1 Answers

In full disclosure, I'm a contributor to CoreNLP, so this is a biased answer. But, in my view on your three criteria:

  1. Named Entity Recognition: I think CoreNLP clearly wins here, both on accuracy and ease-of-use. For one, OpenNLP has a model per NER tag, whereas CoreNLP detects all tags with a single Annotator. Furthermore, temporal resolution with SUTime is a nice perk in CoreNLP. Accuracy-wise, my anecdotal experience is that CoreNLP does better on general-purpose text.

  2. Gender identification. I think both tools are kind of poorly documented on this front. OpenNLP seems to have a GenderModel class; CoreNLP has a gender Annotator.

  3. Training API. I suspect the OpenNLP training API is easier-to-use for not off-the-shelf training. But, if all you want to do is, e.g., train a model from a CoNLL file, both should be straightforward. Training speed tends to be faster with CoreNLP than other tools I've tried, but I haven't benchmarked it formally, so take that with a grain of salt.

like image 180
Gabor Angeli Avatar answered Sep 18 '22 19:09

Gabor Angeli