I have a data frame which looks like this:
df<- data.frame("iteration" = c(1,1,1,1,1,1),
"model" = c("RF","RF","RF","SVM", "SVM","SVM"),
"label" = c(0,0,1,0,0,1), "prediction" = c(0,1,1,0,1,1))
iteration model label prediction
1 1 RF 0 0
2 1 RF 0 1
3 1 RF 1 1
4 1 SVM 0 0
5 1 SVM 0 1
6 1 SVM 1 1
Actually, it has 10 iterations, more models and more data for each model.
What I am trying to do is basically to get the accuracy for each model.
So basically I want to apply this to each model group (RF,SVM):
table(df$label,df$prediction)
0 1
0 2 2
1 0 2
Them sum the diagonal and divided by the total:
sum(diag(table(df$label,df$prediction)))/sum(table(df$label,df$prediction))
[1] 0.6666667
Is this a case where I can use tapply or is dplyrcomes in handy?
I am quite lost here.
Try:
library(dplyr)
df %>%
group_by(iteration, model) %>%
summarise(accuracy = sum(label == prediction) / n())
Which gives:
#Source: local data frame [2 x 3]
#Groups: iteration [?]
#
# iteration model accuracy
# (dbl) (fctr) (dbl)
#1 1 RF 0.6666667
#2 1 SVM 0.6666667
The idea is to sum the number of times label == prediction returns TRUE and divide it by the size of the partition n()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With