I’m looking for a library that does text analysis and extract entities.
The type/classification of an entity is not critical, it’s the identification of something that’s worthwhile that is critical. The entities universe in this case is infinite, it’s not bounded by fixed dictionary.
It seems that there are a couple of web services that do that (NERD let you compare the results of these web services: http://nerd.eurecom.fr/documentation which is pretty useful), but I’m looking for a local library and not a remotely hosted service. I’d prefer Java or .NET but if it’s a good library I’ll learn whatever language that it’s written in.
There are few older threads on similar topic and I was hoping to find new development in this area, and/or libraries built on top of lower level NLP libraries:
Does anyone know about a good library that does a decent job?
I've researched, but never used, the following hosted entity identification services:
OpenCalais
AlchemyAPI
If you are comfortable with Perl, there are several language taggers / parts-of-speech taggers available (Lingua::TreeTagger and Lingua::BrillTagger come to mind (via Google)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With