Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named entity recognition with Java

I would like to use named entity recognition (NER) to find adequate tags for texts in a database. Instead of using tools like NLTK or Lingpipe I want to build my own tool.

So my questions are:

  • Which algorithm should I use?

  • How hard is to build this tool?

like image 881
aniss.bouraba Avatar asked Apr 06 '11 18:04

aniss.bouraba


1 Answers

I did this some time ago when I studied Markov chains.

Anyway, the answers are:

Which algorithm should I use?

Stanford NLP for example uses Conditional Random Field (CRF). If you are not trying to do this effectively, you are like dude from Jackass 3d who was pissing in the wind. There is no simple way to parse human language, as it's construction is complex and it has tons of exceptions.

How hard is to build this tool?

Well if you know what you are doing, then it's not that hard at all. The process of entering the rules and logic can be annoying and time consuming, and fixing bugs can be nontrivial. But in 20 years, you can make something almost useful (for yourself).

like image 198
Margus Avatar answered Sep 24 '22 01:09

Margus