Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good open source package for building flexible spam detection on a large Rails site?

My site is getting larger and it's starting to attract a lot of spam through various channels. The site has a lot of different types of UGC (profiles, forums, blog comments, status updates, private messages, etc, etc). I have various mitigation efforts underway, which I hope to deploy in a blitzkrieg fashion to convince the spammers that we're not a worthwhile target. I have high confidence in what I'm doing functionality wise, but one missing piece is killing all the old spam all at once.

Here's what I have:

  • Large good/bad corpora (5-figure bad, 6 or 7-figure good). A lot of the spam has very reliable fingerprints, and the fact that I've sort of been ignoring it for 6 months helps :)
  • Large, modular Rails site deployed to AWS. It's not a huge traffic site, but we're running 8 instances with the beginnings of a SOA.
  • Ruby, Redis, Resque, MySQL, Varnish, Nginx, Unicorn, Chef, all on Gentoo

My requirements:

  1. I want it to perform reasonably well given the volume of data (therefore I'm wary of a pure ruby solution).
  2. I should be able to train multiple classifications to different types of content (419-scam vs botnet link spam)
  3. I would like to be able to add manual factors based on our own detective work (pattern matching, IP reuse, etc)
  4. Ultimately I want to construct a nice interface to be used with Ruby. If this requires getting my hands dirty in C or whatever, I can handle it, but I'll avoid it if I can.

I realize this is a long and vague question, but what I'm looking for primarily is just a list of good packages, and secondarily any random thoughts from someone who has built a similiar system about ways to approach it.

like image 562
gtd Avatar asked Jun 03 '11 21:06

gtd


1 Answers

We looked for an acceptable open source solution and didn't find one.

If you come to the same conclusion and decide to consider proprietary anti-spam, check out the paid Akismet collaborative spam filtering service. We've had decent performance from it across a dozen medium sized sites. It integrates with rails through rack and rackismet.

like image 87
Mori Avatar answered Oct 04 '22 03:10

Mori