In decision tree J48 example, when we say tree pruned or unpruned, what is the difference?
The unpruned trees are larger. What happens is that basically the tree is created according to the implemented algorithm and if pruning is enabled, an additional step looks at what nodes/branches can be removed without affecting the performance too much.
The idea behind pruning is that, apart from making the tree easier to understand, you reduce the risk of overfitting to the training data. That is, being able to classify the training data (almost) perfectly, but nothing else because instead of learning the underlying concept, the tree has learned the properties intrinsic and specific to the training data.
I would like to add following to Lars' answer. Taken from following link
Many algorithms attempt to "prune", or simplify, their results. Pruning produces fewer, more easily interpreted results. More importantly, pruning can be used as a tool to correct for potential overfitting. ...
J48 employs two pruning methods.
The first is known as subtree replacement. This means that nodes in a decision tree may be replaced with a leaf -- basically reducing the number of tests along a certain path. This process starts from the leaves of the fully formed tree, and works backwards toward the root.
The second type of pruning used in J48 is termed subtree raising. In this case, a node may be moved upwards towards the root of the tree, replacing other nodes along the way. Subtree raising often has a negligible effect on decision tree models. There is often no clear way to predict the utility of the option, though it may be advisable to try turning it off if the induction process is taking a long time. This is due to the fact that subtree raising can be somewhat computationally complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With