Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute error rate from a decision tree?

Does anyone know how to calculate the error rate for a decision tree with R? I am using the rpart() function.

like image 946
teo6389 Avatar asked Mar 12 '12 11:03

teo6389


People also ask

How do you find the error rate in a decision tree?

The total error will be the sum of the individual errors, but out of the sum of all predictions. The error is 1-accuracy ( 1-0.8077 = 0.1923 ). To get the raw number, you can sum the off-diagonal elements from the confusion matrix ( 0+15 = 15 ).

How do you find the error rate of a tree in R?

To calculate the error rate for a decision tree in R, assuming the mean computing error rate on the sample used to fit the model, we can use printcp(). The Root node error is used to compute two measures of predictive performance, when considering values displayed in the rel error column and xerror column.

What is error in decision tree?

error (where error is the probability of making a mistake). There are two error rates to be considered: • training error (i.e. fraction of mistakes made on the training set) • testing error (i.e. fraction of mistakes made on the testing set) The error curves are as follows: tree size vs.

How do you calculate error rate in classification?

Error rate is calculated as the total number of two incorrect predictions (FN + FP) divided by the total number of a dataset (P + N).


1 Answers

Assuming you mean computing error rate on the sample used to fit the model, you can use printcp(). For example, using the on-line example,

> library(rpart) > fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) > printcp(fit)  Classification tree: rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)  Variables actually used in tree construction: [1] Age   Start  Root node error: 17/81 = 0.20988  n= 81           CP nsplit rel error  xerror    xstd 1 0.176471      0   1.00000 1.00000 0.21559 2 0.019608      1   0.82353 0.82353 0.20018 3 0.010000      4   0.76471 0.82353 0.20018 

The Root node error is used to compute two measures of predictive performance, when considering values displayed in the rel error and xerror column, and depending on the complexity parameter (first column):

  • 0.76471 x 0.20988 = 0.1604973 (16.0%) is the resubstitution error rate (i.e., error rate computed on the training sample) -- this is roughly

    class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis) 1-sum(diag(class.pred))/sum(class.pred) 
  • 0.82353 x 0.20988 = 0.1728425 (17.2%) is the cross-validated error rate (using 10-fold CV, see xval in rpart.control(); but see also xpred.rpart() and plotcp() which relies on this kind of measure). This measure is a more objective indicator of predictive accuracy.

Note that it is more or less in agreement with classification accuracy from tree:

> library(tree) > summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))  Classification tree: tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) Number of terminal nodes:  10  Residual mean deviance:  0.5809 = 41.24 / 71  Misclassification error rate: 0.1235 = 10 / 81  

where Misclassification error rate is computed from the training sample.

like image 163
chl Avatar answered Sep 21 '22 04:09

chl