Identifying a person's name vs. a dictionary word

Question

Is there some way to recognize that a word is likely to be/is not likely to be a person's name?

So if I see the word "understanding" I would get a probability of 0.01, whereas the word "Johnson" would return a probability of 0.99, while a word like Smith would return 0.75 and a word like Apple 0.15.

Is there any way to do this?

The goal is, if someone searches for, say Charles Darwin galapagos, the search engine guesses that it should search the author field for Charles and Darwin and the title and abstract fields for galapagos.

Ask About Monica · Accepted Answer

My quick hack would be this:

Get the list from the census bureau of names in order of popularity, it's freely available. Give each name a normalized popularity score (1.0 = most popular, 0.0 = least).

Then, get an opensource dictionary, and do some research to pull together a frequency score for every word. You can find one here, at wiktionary. Assign every word a popularity score, 1.0 to 0.0. The convenient thing is that if you can't find a word on the frequency list, you get to assume it's a pretty uncommon word.

Look for a word on both lists. If it's on just one or the other, you're done. If it's on both, use a formula to compute a weighted probability... something like (Name Popularity) / (Name Popularity + Other Popularity). If it's not on either list, it's probably a name.

Qnan · Answer

A related task in natural language processing is known as Named Entity Recognition and deals with names of people, organizations, locations, etc.

Most models designed to solve this problem are statistical in nature and use both context and prior knowledge in their predictions. There is a number of open source implementations one can use, e.g. the Stanford NER, see the online demo.

Identifying a person's name vs. a dictionary word

Tags:

dictionary

algorithm

search

nlp

Jordan Reiter

2 Answers

Ask About Monica

Qnan

Recent Activity

Donate For Us

Identifying a person's name vs. a dictionary word

Tags:

dictionary

algorithm

search

nlp

Jordan Reiter

2 Answers

Ask About Monica

Qnan

Related questions

Recent Activity

Donate For Us