Tree sizes given by CP table in rpart

Tags:

In the R package rpart, what determines the size of trees presented within the CP table for a decision tree? In the below example, the CP table defaults to presenting only trees with 1, 2, and 5 nodes (as nsplit = 0, 1 and 4 respectively).

library(rpart)   
fit <- rpart(Kyphosis ~ Age + Number + Start, method="class", data=kyphosis)
> printcp(fit) 

Classification tree:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis, 
method = "class")

Variables actually used in tree construction:
[1] Age   Start

Root node error: 17/81 = 0.20988

n= 81 

        CP nsplit rel error  xerror    xstd
1 0.176471      0   1.00000 1.00000 0.21559
2 0.019608      1   0.82353 0.94118 0.21078
3 0.010000      4   0.76471 0.94118 0.21078

Is there an inherent rule rpart() used to determine what size of trees to present? And is it possible to force printcp() to return cross-validation statistics for all possible sizes of tree, i.e. for the above example, also include rows for trees with 3 and 4 nodes (nsplit = 2, 3)?

523

asked Jan 09 '15 14:01

alopex

2 Answers

The rpart() function is controlled using the rpart.control() function. It has parameters such as minsplit which tells the function to only split when there are more observations then the value specified and cp which tells the function to only split if the overall lack of fit is decreased by a factor of cp. If you look at summary(fit) on your above example it shows the statistics for all values of nsplit. To get these values to print when using printcp(fit) you need to choose appropriate values of cp and minsplit when calling the original rpart function.

answered Oct 07 '22 15:10

Kevin

The cran-r documentation on rpart mentions adding option cp=0 to the rpart function. http://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf It also mentions other options which can be given in the rpart function for eg to control the number of splits.

    dfit <- rpart(y ~ x, method='class',
            control = rpart.control(xval = 10, minbucket = 2, **cp = 0**))

answered Oct 07 '22 15:10

Amrita Sawant

Related questions
                            
                                Lagging Forward in plm
                            
                                Fitting a curve "around" data points in R
                            
                                Rserve connection fails
                            
                                How to furnish a ggplot2 figure with a hyperlink?
                            
                                Execute a function from package after doing library(pkg)
                            
                                Colouring points by factor within the margin of a faceted ggplot2 plot in R
                            
                                RStudio: Build and reload adds blank line to 'suggests' field in DESCRIPTION file
                            
                                When writing an R package that uses the Matrix package, why do I have to specify Matrix::t() instead of just t()?
                            
                                ddply for creating the union of lists
                            
                                How to delete an element from a list of strings in R
                            
                                extract() data from raster with small polygons - rounded weights too small
                            
                                Specifying gpar settings for grid arrows in R
                            
                                Indexing a list with an empty index
                            
                                Wordcloud in different shapes in R
                            
                                data.table: Bypass setkey when using monotonic transform of a key variable
                            
                                data.table join (Error in vecseq) is key necessary on both on X and i?
                            
                                Establishing ssh connection from within RStudio on linux
                            
                                Lisp/Scheme-like calls in R
                            
                                R: Color overlaps in Venn diagram by size of overlap
                            
                                R warning: " 'package:stats' may not be available when loading"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tree sizes given by CP table in rpart

Tags:

r

tree

decision-tree

cross-validation

rpart

alopex

People also ask

2 Answers

Kevin

Amrita Sawant

Recent Activity

Donate For Us