I recently confronted with a weird yet interesting question. The questions is as follows: Need to write a program which can give the gender as output based on the name. Example: INPUT --> John Michael Britney OUTPUT--> male male female
So this is the output I expect. I tried a lot to solve, but I really was not able to crack it. I will be really thankful to this site for giving me an opportunity to share this question.
Actually this is asked in a programming contest as a flyer problem, so I thought this can be programmed.
Statistical approach works really well, depending on countries the precision is 95% or 99%+ with few exceptions (Chinese names, Korean names).
Check out the GendRE API http://namsor.com/api
It recognizes automatically the culture behind a name, to apply the appropriate dictionary (ex. Andrea Rossini is male, Andrea Parker is female, etc.)
Don't give up.
I would take a statistical approach... you need to get your hands on a massive names database that actually has gender info... then teach your program to learn from that dataset.
The thing is you need a third variable for correlation. Something like country of origin, ethnicity, etc will narrow your odds even further. You really need that 3rd "clue"...
You can't do it algorithmically: you need a database to do it statistically. This SO question points to many such available resources. Do realize you'll have many, MANY misguesses -- either the Korean Kim's (males) or the Northern European ones (females) may get pretty peeved at that kind of thing, for example;-).
I have been using time solving this as well. My first approach was to use lists of approved names, we have those in Denmark where i'm from, but i quickly realized that only a few countries have. Besides that, i was getting feedback that a probabilistic guess would be much more functional and also that one should be able to filter for a country or language id. I then rebuilded using datasets of users from social networks instead which actually works quite well.
You can check it out at http://genderize.io
Simple example:
http://api.genderize.io?name=kim
{"name":"kim","gender":"female","probability":"0.91","count":687}
http://api.genderize.io?name=kim&country_id=dk
{"name":"kim","gender":"male","probability":"1.00","count":17,"country_id":"dk"}
What about Human Computer Interaction as the 3rd clue.
You could have a click map such as http://css-tricks.com/tracking-clicks-building-a-clickmap-with-php-and-jquery/
Based on where the user clicks you could determine a reasonable statistic of male vs. female. This would be used when unknown is in the database
Heres a Wikipedia on "Gender_HCI":
"Larger displays helped reduce the gender gap in navigating virtual environments. With smaller displays, males’ performance was better than females’. With larger displays, females’ performance improved and males’ performance was not negatively affected."
So have a small box and time the amount of time required to click it. ...?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With