In short: is there a way to loop through each element of the lapply object described below allModelsResults ?
Meaning, allModelsResults$'1' for example gives me 1st element from the object. Next allModelsResults$'2' would be the 2nd element. I would like to create a for loop in order to extract each element, run some commands, and store the results.
Detailed description below...
I have the following code, where I run a simple ML model using "knn" across multiple model specifications. The model specifications are stored in allModelList, and all the results are stored in allModelsResults.
An single model from all model list looks like:
y ~ x1 + x2 + x3
or
y ~ x1 + x5 + x4
and so on... in short a series of combinations of model specifications
allModelsResults <- lapply(allModelsList, function(x) train(x, data=All_categories_merged_done,method = "knn"))
I would like to now extract each element (results from each model) one by one to run analysis on. For example I can manually take:
allModelsResults$'1' to get results from the first model, or allModelsResults$'5' to et results from the 5th model and so on.
I ideally I would loop through these in a for loop, were each time I select one of the elements are run a series of commands on.
Any help on how to extract the elements from allModelsResults object would really help! I have about 50 model specifications, so I need to create a loop or something similar to extract one by one automatically.
Specifically in order to share for the community, for each element I would like to do this one by one for each model.
As an example I am extracting model 1 here (this does not work obviously):
aggregate_results <- NULL
for(z in 1:length(categories)){
element_number_ID <- (element_number[z])
element_number_ID should equal '1' to extract the right model
model_1_result <- allModelsResults$'1'
ResultsTestPred <- predict(model_1_result, testing_data)
results_to_store <- confusionMatrix(ResultsTestPred, testing_data $outcome)
aggregate_results <- rbind(aggregate_results, results_to_store)
}
results_to_store output for one element looks like:
Confusion Matrix and Statistics
Reference
Prediction 0 1 0 14 2 1 4 19
Accuracy : 0.8462
95% CI : (0.6947, 0.9414)
No Information Rate : 0.5385
P-Value [Acc > NIR] : 0.00005274
Kappa : 0.688
Mcnemar's Test P-Value : 0.6831
Sensitivity : 0.7778
Specificity : 0.9048
Pos Pred Value : 0.8750
Neg Pred Value : 0.8261
Prevalence : 0.4615
Detection Rate : 0.3590
Detection Prevalence : 0.4103
Balanced Accuracy : 0.8413
'Positive' Class : 0
Where I want to save Accuracy value for each element/model. This way I can compare each model specification with regard to accuracy.
Any insight would be greatly appreciated!
You seem to want to get predictions and confusion matrix for each model. Without a reproducible example and with some confusion terminology, I'm doing a lot of guesswork, but I think I understand what you want (or close enough). I'll show you how I would do it with lapply and Map, and then we can do it with a for loop too.
First, get predictions on the testing data. All of these methods are exactly the same:
# lapply way
predictions = lapply(allModelsList, predict, newdata = testingdata)
# for loop way
predictions = list()
for (i in 1:length(allModelsList)) {
predictions[[i]] = predict(allModelsList[[i]], newdata = testingdata)
}
# manual way - just so you understand exactly what's going on
predictions = list(
predict(allModelsList[[1]], newdata = testingdata),
predict(allModelsList[[2]], newdata = testingdata),
predict(allModelsList[[3]], newdata = testingdata),
...
)
Now, predictions is a list, so we access each element with [[. The first one is predictions[[1]], the kth one is predictions[[k]] if we want to define some variable k (like to use in loop). We could also add descriptive names and use the names instead of the indices.
Similarly, we can calculate all the confusion matrices:
# lapply way
conf_matrices = lapply(predictions, confusionMatrix, reference = testingdata$outcome)
# for loop way
conf_matrices = list()
for (p in 1:length(predictions)) {
conf_matrices[[p]] = confusionMatrix(p, reference = testingdata$outcome)
}
# manual way (for illustration)
conf_matrices = list(
confusionMatrix(predictions[[1]], reference = testingdata$outcome),
confusionMatrix(predictions[[2]], reference = testingdata$outcome),
...
)
Again, we have a list. The first confusion matrix is conf_matrices[[1]] and all the same as above.
Hopefully that's helps us understand how to use lapply or a for loop to create a list.
Now, toward the bottom of your question you seem to imply that the Accuracy part of the confusion matrix. I ran the example at the bottom of the help page ?confusionMatrix and looked at the result. Running str(conf_mat) on a result showed me that it is a list, and that the "overall" element of the list is a named vector, including the "Accuracy". So, for an individual confusion matrix cm we can extract the accuracy with cm[["overall"]]["Accuracy"]. We use [[ for the list part and [ for the regular vector part. (We could also use cm$overall["Accuracy"]. $ works when we give it the exact name, no quotes, no variables. A lot of your issues seem to be related to trying to use $ with quotes or variables. You just can't do that. See fortunes::fortune(312)).
So, we can extract the accuracies from our confusion matrix list:
# I use *s*apply here so the result will be *s*implified into a vector
acc = sapply(conf_matrices, function(cm) cm[["overall"]]["Accuracy"])
acc = numeric(length(conf_matrices))
for (i in 1:length(conf_matrices)) {
acc[i] = conf_matrices[[i]][["overall"]]["Accuracy"]
}
Or, if you know from the beginning you only want the accuracy, we could get there directly without saving the intermediate steps:
# apply
acc = sapply(allModelsList, function(x) {
pred = predict(x, newdata = testingdata)
cm = confusionMatrix(pred, reference = testingdata$outcome
return(cm[["overall"]]["Accuracy"]
}
)
# for loop acc = numeric(length(allModelsList)) for (i in 1:length(allModelsList)) { pred = predict(allModelsList[[i]], newdata = testingdata) cm = confusionMatrix(pred, reference = testingdata$outcome acc[i] = (cm[["overall"]]["Accuracy"] }
Notes: As mentioned above, without a reproducible example I'm guessing quite a bit and none of this is tested because I don't have any inputs to test on. I'm presuming that what I see in your question in terms of individual steps, like that we want to predict on each element of allModelResults, are correct. (If so, it seems like, say, fittedModels would be a much better name than allModelResults.) I don't know what you mean by "model specifications", and I have no idea what's in allModelList, but hopefully this gives you enough examples of working with lists that you can work out any kinks. (There may also be, say, mismatched parentheses or missing brackets.)
lapply and sapply are convenient for letting you do less typing than a for loop, but they're not really any different. They set up an object to hold the results, and they fill it up. If you want to create multiple results at the same time, you may want to just us a for loop. And as the number of steps inside gets longer, it can be easier to debug a for loop anyway. Use what you like and what makes sense to you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With