As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I was wondering if there was a list of common English first name variations that I could use to detect and correct such errors.
A name variant is an alternative of a name that is considered to be equivalent to that name, but which differs from the name in its particular external form. In other words, the two names are considered somehow equivalent and can be substituted for the other in some context.
We combed through more than 700,000 baby names registered on the BabyCenter site and found those with the most alternate spellings for both boys and girls. The winners? Caden, with 52 different spellings, and Aaliyah, with a whopping 89!
I would crawl all wikipedia pages (there is an available dump of wikipedia data) on people names, e.g., http://en.wikipedia.org/wiki/Teresa (from http://en.wikipedia.org/wiki/Category:English_given_names), and create an index that you can use to suggest people correct forms (you will rank them by the number of first name variants in your database). Unfortunately I do not know. such a database.
This thread points to a list of nickname/first name maps from the census:
http://deron.meranda.us/data/nicknames.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With