I Come from a predominantly python + scikit learn background, and I was wondering how would one obtain the cross validation accuracy for a logistic regression model in R? I was searching and surprised that there's no easy way to this. I'm looking for the equivalent:
import pandas as pd
from sklearn.cross_validation import cross_val_score
from sklearn.linear_model import LogisticRegression
## Assume pandas dataframe of dataset and target exist.
scores = cross_val_score(LogisticRegression(),dataset,target,cv=10)
print(scores)
For R: I have:
model = glm(df$Y~df$X,family=binomial')
summary(model) 
And now I'm stuck. Reason being, the deviance for my R model is 1900, implying its a bad fit, but the python one gives me 85% 10 fold cross validation accuracy.. which means its good. Seems a bit strange... So i wanted to run cross val in R to see if its the same result.
Any help is appreciated!
R version using caret package:
library(caret)
# define training control
train_control <- trainControl(method = "cv", number = 10)
# train the model on training set
model <- train(target ~ .,
               data = train,
               trControl = train_control,
               method = "glm",
               family=binomial())
# print cv scores
summary(model)
                        Below I took an answer from here and made a few changes.
The changes I made were to make it a logit (logistic) model, add modeling and prediction, store the CV's results, and to make it a fully working example.
Also note that there are many packages and functions you could use, including cv.glm() from boot.
data(ChickWeight)
df                    <- ChickWeight
df$Y                  <- 0
df$Y[df$weight > 100] <- 1
df$X                  <- df$Diet 
df     <- df[sample(nrow(df)),]
folds  <- cut(seq(1,nrow(df)),breaks=10,labels=FALSE)
result <- list()
for(i in 1:10){
  testIndexes <- which(folds==i,arr.ind=TRUE)
  testData    <- df[testIndexes, ]
  trainData   <- df[-testIndexes, ]
  model       <- glm(Y~X,family=binomial,data=trainData)
  result[[i]] <- predict(model, testData) 
}
result
You could add a line to calculate accuracy within the loop or just do it after the loop completes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With