I've been doing a little comparison of these two packages and am not sure which direction to go in. What I am looking for briefly is:
From what I can tell, OpenNLP and Stanford CoreNLP expose pretty similar capabilities. However, Stanford CoreNLP looks like it has a lot more activity whereas OpenNLP has only had a few commits in the last six months.
Based on what I saw, OpenNLP appears to be easier to train new models and might be more attractive for that reason alone. However, my question is what would others start with as the basis for adding NLP features to a Java app? I'm mostly worried as to whether OpenNLP is "just mature" versus semi-abandoned.
A small difference is that OpenNLP is Java whereas NLTK is Python. So your preference can come into play. Another difference is that NLTK has build in methods for downloading corpora. If you were a little more specific about what you wanted, people could give you better advice.
In full disclosure, I'm a contributor to CoreNLP, so this is a biased answer. But, in my view on your three criteria:
Named Entity Recognition: I think CoreNLP clearly wins here, both on accuracy and ease-of-use. For one, OpenNLP has a model per NER tag, whereas CoreNLP detects all tags with a single Annotator. Furthermore, temporal resolution with SUTime is a nice perk in CoreNLP. Accuracy-wise, my anecdotal experience is that CoreNLP does better on general-purpose text.
Gender identification. I think both tools are kind of poorly documented on this front. OpenNLP seems to have a GenderModel class; CoreNLP has a gender Annotator.
Training API. I suspect the OpenNLP training API is easier-to-use for not off-the-shelf training. But, if all you want to do is, e.g., train a model from a CoNLL file, both should be straightforward. Training speed tends to be faster with CoreNLP than other tools I've tried, but I haven't benchmarked it formally, so take that with a grain of salt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With