Why does the C4.5 algorithm use pruning in order to reduce the decision tree and how does pruning affect the predicion accuracy?

Tags:

I have searched on google about this issue and I can't find something that explains this algorithm in a simple yet detailed way.

For instance, I know the id3 algorithm doesn't use pruning at all, so if you have a continuous characteristic, the prediction success rates will be very low.

So the C4.5 in order to support continuous characteristics it uses pruning, but is this the only reason?

Also I can't really understand in the WEKA application, how exactly the confidence factor affects the efficiency of the predictions. The smaller the confidence factor the more pruning the algorithm will do, however what is the correlation between pruning and the prediction's accuracy? The more you prune, the better the predictions or the worse?

Thanks

548

asked Jun 02 '12 19:06

ksm001

1 Answers

Pruning is a way of reducing the size of the decision tree. This will reduce the accuracy on the training data, but (in general) increase the accuracy on unseen data. It is used to mitigate overfitting, where you would achieve perfect accuracy on training data, but the model (i.e. the decision tree) you learn is so specific that it doesn't apply to anything but that training data.

In general, if you increase pruning, the accuracy on the training set will be lower. WEKA does however offer various things to estimate the accuracy better, namely training/test split or cross-validation. If you use cross-validation for example, you'll discover a "sweet spot" of the pruning confidence factor somewhere where it prunes enough to make the learned decision tree sufficiently accurate on test data, but doesn't sacrifice too much accuracy on the training data. Where this sweet spot lies however will depend on your actual problem and the only way to determine it reliably is to try.

182

answered Sep 30 '22 07:09

Lars Kotthoff

Related questions
                            
                                WEKA K-Means Clustering
                            
                                How do I use a JSON file with weka
                            
                                How to represent text for classification in weka?
                            
                                Classifying Single Instance in Weka
                            
                                Formula for "Relative absolute error" and "Root relative squared error" used in machine learning (as computed by Weka)
                            
                                Weka Creating Instance Object error
                            
                                Train and test set are not compatible error in weka?
                            
                                What is the meaning of jitter in visualize tab of weka
                            
                                what is f-measure for each class in weka
                            
                                Creating an ARFF file from python output
                            
                                Example for svm feature selection in R
                            
                                How to reuse saved classifier created from explorer(in weka) in eclipse java
                            
                                Missing Values in Weka --
                            
                                Adding a new Instance in weka
                            
                                How to cluster an instance with Weka's DBSCAN?
                            
                                How to calculate the nearest neighbors using weka from the command line?
                            
                                ARFF for natural language processing
                            
                                How to get the nearest neighbor in weka using java
                            
                                Can TF/IDF take classes in account
                            
                                Which datamining tool to use? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the C4.5 algorithm use pruning in order to reduce the decision tree and how does pruning affect the predicion accuracy?

Tags:

decision-tree

weka

pruning

ksm001

People also ask

1 Answers

Lars Kotthoff

Recent Activity

Donate For Us