Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating prediction accuracy of a tree using rpart's predict method

I have constructed a decision tree using rpart for a dataset.

I have then divided the data into 2 parts - a training dataset and a test dataset. A tree has been constructed for the dataset using the training data. I want to calculate the accuracy of the predictions based on the model that was created.

My code is shown below:

library(rpart)
#reading the data
data = read.table("source")
names(data) <- c("a", "b", "c", "d", "class")

#generating test and train data - Data selected randomly with a 80/20 split
trainIndex  <- sample(1:nrow(x), 0.8 * nrow(x))
train <- data[trainIndex,]
test <- data[-trainIndex,]

#tree construction based on information gain
tree = rpart(class ~ a + b + c + d, data = train, method = 'class', parms = list(split = "information"))

I now want to calculate the accuracy of the predictions generated by the model by comparing the results with the actual values train and test data however I am facing an error while doing so.

My code is shown below:

t_pred = predict(tree,test,type="class")
t = test['class']
accuracy = sum(t_pred == t)/length(t)
print(accuracy)

I get an error message that states -

Error in t_pred == t : comparison of these types is not implemented In addition: Warning message: Incompatible methods ("Ops.factor", "Ops.data.frame") for "=="

On checking the type of t_pred, I found out that it is of type integer however the documentation

(https://stat.ethz.ch/R-manual/R-devel/library/rpart/html/predict.rpart.html)

states that the predict() method must return a vector.

I am unable to understand why is the type of the variable is an integer and not a list. Where have I made the mistake and how can I fix it?

like image 491
Arat254 Avatar asked Oct 17 '16 07:10

Arat254


People also ask

How do you measure the accuracy of a decision tree?

Accuracy can be computed by comparing actual test set values and predicted values. Well, you got a classification rate of 67.53%, considered as good accuracy. You can improve this accuracy by tuning the parameters in the Decision Tree Algorithm.

What is Rpart in decision tree?

Rpart is a powerful machine learning library in R that is used for building classification and regression trees. This library implements recursive partitioning and is very easy to use.

How does Rpart calculate variable importance?

From the rpart vignette (page 12), “An overall measure of variable importance is the sum of the goodness of split measures for each split for which it was the primary variable, plus goodness (adjusted agreement) for all splits in which it was a surrogate.”

What is the accuracy of the decision tree model?

The decision tree classifier gave an accuracy of 91%.


1 Answers

Try calculating the confusion matrix first:

confMat <- table(test$class,t_pred)

Now you can calculate the accuracy by dividing the sum diagonal of the matrix - which are the correct predictions - by the total sum of the matrix:

accuracy <- sum(diag(confMat))/sum(confMat)
like image 182
mtoto Avatar answered Sep 20 '22 14:09

mtoto