I provided some of my programs with a feedback function. Unfortunately I forgot to include some sort of spam-protection - so users could send anything they wanted to my server - where every feedback is stored in a huge db.
In the beginning I periodically checked those feedbacks - I filtered out what was usable and deleted garbage. The problem is: I get 900 feedbacks per day. Only 4-5 are really useful, the other messages are mostly 2 type of gibberish:
What I did so far:
I installed a filter to delete any feedback containing "asdf", "qwer" etc... -> only 700 per day
I installed a word filter to delte anything containing bad language -> 600 per day (don't ask - but there are many strange people out there)
But 400 per day is still way too much. So I'm wondering if anybody has dealt with such a problem before and knows some sort of algorithm to filter out senseless messages.
Any help would really be appreciated!
How about just using some existing implementation of a bayesian spam filter instead of implementing your own. I have had good results with DSpam
A slightly different approach would be to set up a system to email the feedback messages to an account and use standard spam filtering. You could send them through gmail and let their filtering take a shot at it. Not perfect, but not too much effort to implement either.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With