Does anyone know how to calculate the error rate for a decision tree with R? I am using the <code>rpart()</code> function.

Assuming you mean computing error rate on the sample used to fit the model, you can use <code>printcp()</code>. For example, using the on-line example, <pre class="prettyprint"><code>> library(rpart) > fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) > printcp(fit) Classification tree: rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) Variables actually used in tree construction: [1] Age Start Root node error: 17/81 = 0.20988 n= 81 CP nsplit rel error xerror xstd 1 0.176471 0 1.00000 1.00000 0.21559 2 0.019608 1 0.82353 0.82353 0.20018 3 0.010000 4 0.76471 0.82353 0.20018 </code></pre> The <code>Root node error</code> is used to compute two measures of predictive performance, when considering values displayed in the <code>rel error</code> and <code>xerror</code> column, and depending on the complexity parameter (first column): <ul> <li> 0.76471 x 0.20988 = 0.1604973 (16.0%) is the resubstitution error rate (i.e., error rate computed on the training sample) -- this is roughly <pre class="prettyprint"><code>class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis) 1-sum(diag(class.pred))/sum(class.pred) </code></pre> </li> <li>0.82353 x 0.20988 = 0.1728425 (17.2%) is the cross-validated error rate (using 10-fold CV, see <code>xval</code> in <code>rpart.control()</code>; but see also <code>xpred.rpart()</code> and <code>plotcp()</code> which relies on this kind of measure). This measure is a more objective indicator of predictive accuracy.</li> </ul> Note that it is more or less in agreement with classification accuracy from <code>tree</code>: <pre class="prettyprint"><code>> library(tree) > summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis)) Classification tree: tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) Number of terminal nodes: 10 Residual mean deviance: 0.5809 = 41.24 / 71 Misclassification error rate: 0.1235 = 10 / 81 </code></pre> where <code>Misclassification error rate</code> is computed from the training sample.

How to compute error rate from a decision tree?

1 Answers

Assuming you mean computing error rate on the sample used to fit the model, you can use printcp(). For example, using the on-line example,

> library(rpart) > fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) > printcp(fit)  Classification tree: rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)  Variables actually used in tree construction: [1] Age   Start  Root node error: 17/81 = 0.20988  n= 81           CP nsplit rel error  xerror    xstd 1 0.176471      0   1.00000 1.00000 0.21559 2 0.019608      1   0.82353 0.82353 0.20018 3 0.010000      4   0.76471 0.82353 0.20018

The Root node error is used to compute two measures of predictive performance, when considering values displayed in the rel error and xerror column, and depending on the complexity parameter (first column):

0.76471 x 0.20988 = 0.1604973 (16.0%) is the resubstitution error rate (i.e., error rate computed on the training sample) -- this is roughly
```
class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis) 1-sum(diag(class.pred))/sum(class.pred) 
```
0.82353 x 0.20988 = 0.1728425 (17.2%) is the cross-validated error rate (using 10-fold CV, see xval in rpart.control(); but see also xpred.rpart() and plotcp() which relies on this kind of measure). This measure is a more objective indicator of predictive accuracy.

Note that it is more or less in agreement with classification accuracy from tree:

> library(tree) > summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))  Classification tree: tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) Number of terminal nodes:  10  Residual mean deviance:  0.5809 = 41.24 / 71  Misclassification error rate: 0.1235 = 10 / 81

where Misclassification error rate is computed from the training sample.

163

answered Sep 21 '22 04:09

chl

Related questions
                            
                                Assign multiple objects to .GlobalEnv from within a function
                            
                                Rpresentation in Rstudio - Make image fill out the whole screen
                            
                                finding unique values from a list
                            
                                How can I produce plots like this?
                            
                                ggplot side by side geom_bar()
                            
                                Clickable links in Shiny Datatable
                            
                                dplyr join define NA values
                            
                                Split up `...` arguments and distribute to multiple functions
                            
                                What's a good strategy to get a decent overview of big correlation matrices or pairs?
                            
                                kruskal.test shows "All group levels must be finite" error. What is the problem?
                            
                                access data frame column using variable
                            
                                Finding rows containing a value (or values) in any column
                            
                                How to use superscript with ggplot2
                            
                                Apply list of functions to list of values
                            
                                How to find the highest (latest) and lowest (earliest) date [R]
                            
                                Splitting a large data frame into smaller segments
                            
                                Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL
                            
                                Where in R do I permanently store my custom functions?
                            
                                How to add line breaks to plotly hover labels
                            
                                remove row with nan value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compute error rate from a decision tree?

Tags:

r

classification

decision-tree

rpart

teo6389

People also ask

1 Answers

chl

Recent Activity

Donate For Us