Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named Entity Recognition Libraries for Java [closed]

I am looking for a simple but "good enough" Named Entity Recognition library (and dictionary) for java, I am looking to process emails and documents and extract some "basic information" like: Names, places, Address and Dates

I've been looking around, and most seems to be on the heavy side and full NLP kind of projects.

Any recommendations ?

like image 445
webclimber Avatar asked Oct 09 '08 16:10

webclimber


People also ask

Which are standard libraries used for named entity recognition?

Stanford Named Entity Recognizer (SNER): this JAVA tool developed by Stanford University is considered the standard library for entity extraction. It's based on Conditional Random Fields (CRF) and provides pre-trained models for extracting person, organization, location, and other entities.

Which is best model for named entity recognition?

There are two main models used to achieve this goal: Ontology-based models and Deep Learning-based models. Ontology-based Named Entity Recognition uses a knowledge-based recognition process that relies on lists of datasets, such as a list of company names for the company category, to make inferences.

Why is named entity recognition difficult?

Ambiguity and Abbreviations -One of the major challenges in identifying named entities is language. Recognizing words which can have multiple meanings or words that can be a part of different sentences. Another major challenge is classifying similar words from texts.

Is named entity recognition supervised or unsupervised?

NER tagging is a supervised task. You need a training set of labeled examples to train a model for that. However, there is some unsupervised work one can do to slightly improve the performance of models.


1 Answers

You might want to have a look at one of my earlier answers to a similar problem.

Other than that, most lighter NER systems depend a lot on the domain used. You will find a whole lot of tools and papers about biomedical NER systems, for example. In addition to my previous post (which already contains my main recommendation if you want to do NER), here are some more tools you might want to look into:

  • The Stanford CER-NER
  • The Postech Biomedical NER System if you are interested in this particular domain
  • OpenCalais seems to be a commercial system. There are UIMA wrappers for OpenCalais but they seem dated. There is also a dictionary based Context-Mapper annotator for UIMA that may help you out. Be aware that UIMA implies significant overhead in learning curve ;-)
  • OpenNLP also have an NER tool.
  • Balie does NER, too, among other things.
  • ABNER does NER, but again its focused on the biomedical domain.
  • The JULIE Lab Tools from the university of Jena, Germany also do NER. They have standalone versions and UIMA analysis engines.

One additional remark: you won't get away without tokenization on the input. Tokenization of natural language is slightly non-trivial, that's why I suggest you use a toolbox that does both for you.

like image 74
Aleksandar Dimitrov Avatar answered Sep 23 '22 10:09

Aleksandar Dimitrov