Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variations in spelling of first name

As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I was wondering if there was a list of common English first name variations that I could use to detect and correct such errors.

like image 900
Chris Avatar asked Sep 28 '10 02:09

Chris


People also ask

What is a first name variation?

A name variant is an alternative of a name that is considered to be equivalent to that name, but which differs from the name in its particular external form. In other words, the two names are considered somehow equivalent and can be substituted for the other in some context.

What name has the most variations of spelling?

We combed through more than 700,000 baby names registered on the BabyCenter site and found those with the most alternate spellings for both boys and girls. The winners? Caden, with 52 different spellings, and Aaliyah, with a whopping 89!


2 Answers

I would crawl all wikipedia pages (there is an available dump of wikipedia data) on people names, e.g., http://en.wikipedia.org/wiki/Teresa (from http://en.wikipedia.org/wiki/Category:English_given_names), and create an index that you can use to suggest people correct forms (you will rank them by the number of first name variants in your database). Unfortunately I do not know. such a database.

like image 157
Skarab Avatar answered Oct 16 '22 21:10

Skarab


This thread points to a list of nickname/first name maps from the census:

http://deron.meranda.us/data/nicknames.txt

like image 32
Luke Avatar answered Oct 16 '22 23:10

Luke