Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there open source software available that analyses a string and guesses the gender of the author?

I can't find anything other than closed-source web applications. Are there any active projects? I'd be interested in using the software in something I'm developing and getting involved.

like image 235
rmh Avatar asked Dec 27 '08 21:12

rmh


3 Answers

Since you're assuming two categories, almost any classifier will probably do ok. Some suggestions:

  • Naive bayes
  • support vector machines

As an earlier commenter said, starting from a known sample of text (and there should be plenty... newspaper corpuses might be good), train and classify, on some reasonable attributes (maybe presence / absence or words or word pairs).

This one should be (comparatively) easy.

If you're using python, even something as simple as the Natural Language Toolkit (cf: nltk.org) and their book should get you a lot of way there.

like image 22
Gregg Lind Avatar answered Nov 04 '22 10:11

Gregg Lind


Here's another web site that claims to do this: GenderAnalyzer. However it is relying on another website called uClassify.com that is down as I write this. They have a contact link at the bottom for questions.

It sounds like an academic outfit: "In our lab it seems to works pretty well".

like image 198
Steve Steiner Avatar answered Nov 04 '22 10:11

Steve Steiner


There are applications like "The Gender Genie" which operate within a reasonable degree of success: http://bookblog.net/gender/genie.php (and particularly with longer texts)

It doesn't need to be entirely successful. I would have huge amounts of data to deal with, and it's mostly just for fun.

If anyone knows of anything, please do share.

Richard

like image 1
rmh Avatar answered Nov 04 '22 10:11

rmh