Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to guess the nationality of a person from the surname?

Tags:

prediction

What approach can I use to predict the nationality of a person from the surname?

I have a huge list of texts and surnames of authors. I would like to identify which texts have been written by latin-language speakers and which texts have been written by native english speakers, in order to study if certain writing style patterns are different in one group compared to the other.

I have looked in google and in pubmed for a database of surnames, but I could not find any accessible for free. Another approach is to use some regexs, for example ".*ez" to identify some hispanic surnames such as 'rodriguez', but it doesn't get me very far.

Do you have any suggestion? Since I will manually revise all the associations after making the prediction, I don't need a great accuracy, but any help or idea will be welcome.

like image 655
dalloliogm Avatar asked Sep 27 '11 13:09

dalloliogm


2 Answers

I don't think you can do this with any degree of reliability. A Rodriguez may well have a Spanish origin name, but could well have been born and brought up anywhere. They could be second generation British, and never have had Spanish spoken around them, and so come into the category of Native English speaker.

like image 78
Schroedingers Cat Avatar answered Nov 09 '22 18:11

Schroedingers Cat


If Actual authors then maybe you can spider amazon and check their 'Author information' details?

I don't think you can guess. E.g. Irish last names - there are an estimated 80,000,000 people with Irish heritage however on 4.5 million of these live in Ireland/went through Irish education.

like image 29
Dave Walker Avatar answered Nov 09 '22 18:11

Dave Walker