Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write spam filter

I'm stuck in having to write a simple spam filter I'm not really sure about how I'm going to do it.

So far I've come up with wordlist and domain filtering, which will give or remove points up to a certain threshold.

For example, if you're writing about "v1agr4" from a blacklisted domain, you'll get like 2 points for spam, but if you're writing about "v1agr4" from a hotmail.com account, you'll get only 1 "spam point".

Do you guys have any other suggestions / ressources?

This is more about learning spam filters than developing something enterprise grade

like image 972
Eric Avatar asked Nov 17 '08 19:11

Eric


People also ask

What is a spam filter an example of?

A spam filter is a program used to detect unsolicited, unwanted and virus-infected emails and prevent those messages from getting to a user's inbox. Like other types of filtering programs, a spam filter looks for specific criteria on which to base its judgments.

How do you mark something spam?

Tap the sender's profile image next to the message you want to mark as spam. Report spam.


3 Answers

Some really good algorithm info here:

http://www.paulgraham.com/spam.html

http://www.paulgraham.com/better.html

But, seriously, why reinvent the wheel?

Just download K9: http://keir.net/k9.html

like image 104
BoltBait Avatar answered Oct 07 '22 21:10

BoltBait


Some open source Java projects related to Bayesian Spam Filtering (that was mentioned by LFSR Consulting):

  • Classifier4j
  • jBNC
  • Naiban

And one extra for C++:

  • SpamProbe
like image 32
Touko Avatar answered Oct 07 '22 23:10

Touko


Look into Bayesian Spam Filtering.

I know perl has a library for it, so I'd assume java would have one too.

like image 31
Gavin Miller Avatar answered Oct 07 '22 23:10

Gavin Miller