Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decision tree vs. Naive Bayes classifier [closed]

I am doing some research about different data mining techniques and came across something that I could not figure out. If any one have any idea that would be great.

In which cases is it better to use a Decision tree and other cases a Naive Bayes classifier?

Why use one of them in certain cases? And the other in different cases? (By looking at its functionality, not at the algorithm)

Anyone have some explanations or references about this?

like image 729
Y2theZ Avatar asked Apr 25 '12 14:04

Y2theZ


People also ask

Which is more accurate Naive Bayes or decision tree?

The average F-measure value of Naïve Bayes is the biggest among the three, that is 0.780. Decision tree has average F-measure of 0.676 and k-NN of 0.543. Therefore we can say that Naïve Bayes is the best in terms of precision and recall followed by Decision Tree and k- NN.

Why is Naive Bayes faster than decision tree?

Abstract. Naïve Bayes Tree uses decision tree as the general structure and deploys naïve Bayesian classifiers at leaves. The intuition is that naïve Bayesian classifiers work better than decision trees when the sample data set is small.

What is better than naive Bayes classifier?

In the former case, Naive Bayes is probably the better choice, while Decision Trees will most likely outperform it in the latter. Are we trying to predict a class that appears very rarely in a data set?

Why is Naive Bayes better than other algorithms?

This algorithm works quickly and can save a lot of time. Naive Bayes is suitable for solving multi-class prediction problems. If its assumption of the independence of features holds true, it can perform better than other models and requires much less training data.


1 Answers

Decision Trees are very flexible, easy to understand, and easy to debug. They will work with classification problems and regression problems. So if you are trying to predict a categorical value like (red, green, up, down) or if you are trying to predict a continuous value like 2.9, 3.4 etc Decision Trees will handle both problems. Probably one of the coolest things about Decision Trees is they only need a table of data and they will build a classifier directly from that data without needing any up front design work to take place. To some degree properties that don't matter won't be chosen as splits and will get eventually pruned so it's very tolerant of nonsense. To start it's set it and forget it.

However, the downside. Simple decision trees tend to over fit the training data more so that other techniques which means you generally have to do tree pruning and tune the pruning procedures. You didn't have any upfront design cost, but you'll pay that back on tuning the trees performance. Also simple decision trees divide the data into squares so building clusters around things means it has to split a lot to encompass clusters of data. Splitting a lot leads to complex trees and raises probability you are overfitting. Tall trees get pruned back so while you can build a cluster around some feature in the data it might not survive the pruning process. There are other techniques like surrogate splits which let you split along several variables at once creating splits in the space that aren't either horizontal or perpendicular ( 0 < slope < infinity ). Cool, but your tree starts to become harder to understand, and its complex to implement these algorithms. Other techniques like boosting and random forest decision trees can perform quite well, and some feel these techniques are essential to get the best performance out of decision trees. Again this adds more things to understand and use to tune the tree and hence more things to implement. In the end the more we add to the algorithm the taller the barrier to using it.

Naive Bayes requires you build a classification by hand. There's not way to just toss a bunch of tabular data at it and have it pick the best features it will use to classify. Picking which features matter is up to you. Decisions trees will pick the best features for you from tabular data. If there were a way for Naive Bayes to pick features you'd be getting close to using the same techniques that make decision trees work like that. Give this fact that means you may need to combine Naive Bayes with other statistical techniques to help guide you towards what features best classify and that could be using decision trees. Naive bayes will answer as a continuous classifier. There are techniques to adapt it to categorical prediction however they will answer in terms of probabilities like (A 90%, B 5%, C 2.5% D 2.5%) Bayes can perform quite well, and it doesn't over fit nearly as much so there is no need to prune or process the network. That makes them simpler algorithms to implement. However, they are harder to debug and understand because it's all probabilities getting multiplied 1000's of times so you have to be careful to test it's doing what you expect. Naive bayes does quite well when the training data doesn't contain all possibilities so it can be very good with low amounts of data. Decision trees work better with lots of data compared to Naive Bayes.

Naive Bayes is used a lot in robotics and computer vision, and does quite well with those tasks. Decision trees perform very poorly in those situations. Teaching a decision tree to recognize poker hands by looking a millions of poker hands does very poorly because royal flushes and quads occurs so little it often gets pruned out. If it's pruned out of the resulting tree it will misclassify those important hands (recall tall trees discussion from above). Now just think if you are trying to diagnose cancer using this. Cancer doesn't occur in the population in large amounts, and it will get pruned out more likely. Good news is this can be handled by using weights so we weight a winning hand or having cancer as higher than a losing hand or not having cancer and that boosts it up the tree so it won't get pruned out. Again this is the part of tuning the resulting tree to the situation that I discussed earlier.

Decision trees are neat because they tell you what inputs are the best predicators of the outputs so often decision trees can guide you to find if there is a statistical relationship between a given input to the output and how strong that relationship is. Often the resulting decision tree is less important than relationships it describes. So decision trees can be used a research tool as you learn about your data so you can build other classifiers.

If you are dicing between using decision trees vs naive bayes to solve a problem often times it best to test each one. Build a decision tree and build a naive bayes classifier then have a shoot out using the training and validation data you have. Which ever performs best will more likely perform better in the field. And it's always a good idea to cast each of those against K-nearest neighbor (KNN) predictors because k-nearest has been shown to out perform both of them in some situations, and KNN is a simple algorithm to implement and use. If KNN performs better than the other two go with it.

Some sources:

The manual on CART based decision trees. This books covers the CART algorithm, but also discusses decision trees, weights, missing values, surrogate splits, boosting, etc. http://www.amazon.com/Classification-Regression-Wadsworth-Statistics-Probability/dp/0412048418

A gentler intro to CART https://www.youtube.com/watch?v=p17C9q2M00Q

Comparison of algorithms - notice that KNN, Decision Trees, C4.5, and SVM do quite well on most of the tests. http://www4.ncsu.edu/~arezaei2/paper/JCIT4-184028_Camera%20Ready.pdf

Another comparison of algorithms - Boosted Decision Trees and random top the list with KNN in the middle: http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf

Another good run down of various techniques: http://www.quora.com/What-are-the-advantages-of-different-classification-algorithms

like image 68
chubbsondubs Avatar answered Nov 10 '22 18:11

chubbsondubs