Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prune a tree in R?

I'm doing a classification using rpart in R. The tree model is trained by:

> tree <- rpart(activity ~ . , data=trainData)
> pData1 <- predict(tree, testData, type="class")

The accuracy for this tree model is:

> sum(testData$activity==pData1)/length(pData1)
[1] 0.8094276

I read a tutorial to prune the tree by cross validation:

> ptree <- prune(tree,cp=tree$cptable[which.min(tree$cptable[,"xerror"]),"CP"])
> pData2 <- predict(ptree, testData, type="class")

The accuracy rate for the pruned tree is still the same:

> sum(testData$activity==pData2)/length(pData2)
[1] 0.8094276

I want to know what's wrong with my pruned tree? And how can I prune the tree model using cross validation in R? Thanks.

like image 951
zfz Avatar asked Mar 10 '13 03:03

zfz


1 Answers

You have used the minimum cross-validated error tree. An alternative is to use the smallest tree that is within 1 standard error of the best tree (the one you are selecting). The reason for this is that, given the CV estimates of the error, the smallest tree within 1 standard error is doing just as good a job at prediction as the best (lowest CV error) tree, yet it is doing it with fewer "terms".

Plot the cost-complexity vs tree size for the un-pruned tree via:

plotcp(tree)

Find the tree to the left of the one with minimum error whose cp value lies within the error bar of one with minimum error.

There could be many reasons why pruning is not affecting the fitted tree. For example the best tree could be the one where the algorithm stopped according to the stopping rules as specified in ?rpart.control.

like image 60
Gavin Simpson Avatar answered Oct 19 '22 15:10

Gavin Simpson