Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter Comment Spam? PHP

I'm looking for articles on ways to filter spam. When I search around all I keep finding is Wordpress, ways to filter swear words etc which is not what I'm looking for. I'm looking for ways to write your own filter system and best practices.

Any tutorial links from anyone who has done this before, would be appreciated.

Only good article i can so far is http://snook.ca/archives/other/effective_blog_comment_spam_blocker

like image 653
Sean H Jenkins Avatar asked Dec 07 '11 17:12

Sean H Jenkins


2 Answers

When writing your own method, you'll have to employ a combination of heuristics.

For example, it's very common for spam comments to have 2 or more URL links.

I'd begin writing your filter like so, using a dictionary of trigger words and have it loop through and use those to determine probability:

function spamProbability($text){
    $probability = 0;  
    $text = strtolower($text); // lowercase it to speed up the loop
    $myDict = array("http","penis","pills","sale","cheapest"); 
    foreach($myDict as $word){
        $count = substr_count($text, $word);
        $probability += .2 * $count;
    }
    return $probability;
}

Note that this method will result in many false positives, depending on your word set; you could have your site "flag" for moderation (but goes live immediately) those with probability > .3 and < .6, have it require those >.6 and <.9 enter a queue for moderation (where they don't appear until approved), and then anything over >1 is simply rejected.

Obviously these are all values you'll have to tweak the thresholds but this should start you off with a pretty basic system. You can add to it several other qualifiers for increasing / decreasing probability of spam, such as checking the ratio of bad words to words, changing weights of words, etc.

like image 137
Tim Avatar answered Oct 29 '22 22:10

Tim


I'm surprised no one mentioned Akismet. I've never had a message marked wrong (be it spam or legit). My WordPress install came with it. All I had to do was hit enable.

like image 45
Brigand Avatar answered Oct 29 '22 22:10

Brigand