Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plain, computer parseable lists of common first names?

Tags:

dataset

I need a list of common first names for people, like "Bill", "Gordon", "Jane", etc. Is there some free list of lots of known names, instead of me having to type them out? Something that I can easily parse with the programme to fill in an array for example?

I'm not worried about:

  • Knowing if a name is masculine or feminine (or both)
  • If the dataset has a whole pile of false positives
  • If there are names that aren't on it, obviously no dataset like this will be complete.
  • If there are 'duplicates', i.e. I don't care if the dataset lists "Bill" and "William" and "Billy" as different names. I'd rather have more data than less
  • I don't care about knowing the popularity the name

I know Wikipedia has a list of most popular given names, but that's all in a HTML page and manged up with horrible wiki syntax. Is there a better way to get some sample data like this without having to screen scrape wikipedia?

like image 909
Amandasaurus Avatar asked Sep 20 '09 21:09

Amandasaurus


People also ask

How many names are there in the world?

Most Popular First Names In The World Forebears knows about 29,918,993 unique forenames in Earth and there are 243 people per name.


2 Answers

  • A CSV from the General Register Office of Scotland with all the forenames registered there in 2007.

  • Another large set of first names in CSV format and SQL format too (but they didn't say which DB dumped the SQL).

  • GitHub page with the top 1000 baby names from 1880 to 2009, already parsed into a CSV for you from the Social Security Administration.

  • CSV of baby names and meanings from a Princeton CS page.

That ought to be enough to get you started, I'd think.

like image 116
Mark Rushakoff Avatar answered Sep 20 '22 16:09

Mark Rushakoff


You can easily consume the Wikipedia API (http://en.wikipedia.org/w/api.php) to retrieve the list of pages in specific category, looks like Category:Given names is something you want to start from.

http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmnamespace=0&cmlimit=500&cmtitle=Category:Given_names

The part of result from this URL looks like this:

  <cm pageid="5797824" ns="0" title="Abdou" />
  <cm pageid="5797863" ns="0" title="Abdu" />
  <cm pageid="859035" ns="0" title="Abdul Aziz" />
  <cm pageid="6504818" ns="0" title="Abdul Qadir" />

Look at the API and select appropriate format and query parameters, and check categories.

P.S. BTW, The wiki-text from page you linked to contain names in a form that easy to extract using regexp... As well as titles of links in the rendered HTML page have “(name)” attached to the name itself.

like image 30
Juicy Scripter Avatar answered Sep 21 '22 16:09

Juicy Scripter