Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define person's names in text (Java)

I have some input text, which contains one or more human person names. I do not have any dictionary for these names. Which Java library can help me to define names from my input text? I looked through OpenNLP, but did not find any example or guide or at least description of how it can be applied into my code. (I saw javadoc, but it is pretty poor documentation for such a project.)

I want to find names from some random text. If the input text is "My friend Joe Smith went to the store.", then I want to get "Joe Smith". I think there should be some large enough dictionaries on smart engines, based on smaller dictionaries, that can understand human names.

like image 725
Denis Avatar asked Dec 09 '09 18:12

Denis


2 Answers

I'd look into LingPipe. Check out this demo. By the way, what you are trying to do is called "named entity recognition". It's a difficult CS problem to get right.

like image 106
MattMcKnight Avatar answered Oct 22 '22 18:10

MattMcKnight


OpenNLP has Named Entity recognition. Check the section English Name Finding in the docs. But my experience suggests, it identifies entities but there are no tags associated with it. (To be precise, I found the tags to ambiguously assigned.) So, if you have the sentence "My friend Joe Smith went to the Walmart store", OpenNLP identifies two named entities - "Joe Smith" and "Walmart". I couldn't get it tag "Joe Smith" as Person and "Walmart" as Organization.

As suggested by Matt, you can try LingPipe, though it's a commercial tool. Some of the open source alternatives are MorphAdorner and Stanford NER.

like image 33
Shashikant Kore Avatar answered Oct 22 '22 19:10

Shashikant Kore