I was wondering if there is any good and clean object-oriented programming (OOP) implementation of Bayesian filtering for spam and text classification? This is just for learning purposes.
How it works. A Bayesian filter works by comparing your incoming email with a database of emails, which are categorised into 'spam' and 'not spam'. Bayes' theorem is used to learn from these prior messages. Then, the filter can calculate a spam probability score against each new message entering your inbox.
Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.
Spam Filtering With Bayes' Rule, we want to find the probability an email is spam, given it contains certain words. We do this by finding the probability that each word in the email is spam, and then multiply these probabilities together to get the overall email spam metric to be used in classification.
Bayesian spam filtering is based on Bayes rule, a statistical theorem that gives you the probability of an event. In Bayesian filtering it is used to give you the probability that a certain email is spam.
I definitely recommend Weka which is an Open Source Data Mining Software written in Java:
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
As mentioned above, it ships with a bunch of different classifiers like SVM, Winnow, C4.5, Naive Bayes (of course) and many more (see the API doc). Note that a lot of classifiers are known to have much better perfomance than Naive Bayes in the field of spam detection or text classification.
Furthermore Weka brings you a very powerful GUI…
Check out Chapter 6 of Programming Collective Intelligence
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With