Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Extracting elements for loop from lapply object

In short: is there a way to loop through each element of the lapply object described below allModelsResults ?

Meaning, allModelsResults$'1' for example gives me 1st element from the object. Next allModelsResults$'2' would be the 2nd element. I would like to create a for loop in order to extract each element, run some commands, and store the results.

Detailed description below...

I have the following code, where I run a simple ML model using "knn" across multiple model specifications. The model specifications are stored in allModelList, and all the results are stored in allModelsResults.

An single model from all model list looks like:

y ~ x1 + x2 + x3

or

y ~ x1 + x5 + x4

and so on... in short a series of combinations of model specifications

allModelsResults <- lapply(allModelsList, function(x) train(x,       data=All_categories_merged_done,method = "knn"))

I would like to now extract each element (results from each model) one by one to run analysis on. For example I can manually take:

allModelsResults$'1' to get results from the first model, or allModelsResults$'5' to et results from the 5th model and so on.

I ideally I would loop through these in a for loop, were each time I select one of the elements are run a series of commands on.

Any help on how to extract the elements from allModelsResults object would really help! I have about 50 model specifications, so I need to create a loop or something similar to extract one by one automatically.

Specifically in order to share for the community, for each element I would like to do this one by one for each model.

As an example I am extracting model 1 here (this does not work obviously):

aggregate_results <- NULL

for(z in 1:length(categories)){
element_number_ID <- (element_number[z])

element_number_ID should equal '1' to extract the right model

    model_1_result <- allModelsResults$'1'

    ResultsTestPred <- predict(model_1_result, testing_data)
    results_to_store <- confusionMatrix(ResultsTestPred, testing_data $outcome)

aggregate_results  <- rbind(aggregate_results, results_to_store)

}

results_to_store output for one element looks like:

Confusion Matrix and Statistics

      Reference

Prediction 0 1 0 14 2 1 4 19

           Accuracy : 0.8462          
             95% CI : (0.6947, 0.9414)
No Information Rate : 0.5385          
P-Value [Acc > NIR] : 0.00005274      

              Kappa : 0.688           

Mcnemar's Test P-Value : 0.6831

        Sensitivity : 0.7778          
        Specificity : 0.9048          
     Pos Pred Value : 0.8750          
     Neg Pred Value : 0.8261          
         Prevalence : 0.4615          
     Detection Rate : 0.3590          

Detection Prevalence : 0.4103
Balanced Accuracy : 0.8413

   'Positive' Class : 0       

Where I want to save Accuracy value for each element/model. This way I can compare each model specification with regard to accuracy.

Any insight would be greatly appreciated!

like image 604
Peter Alexander Avatar asked Apr 30 '26 16:04

Peter Alexander


1 Answers

You seem to want to get predictions and confusion matrix for each model. Without a reproducible example and with some confusion terminology, I'm doing a lot of guesswork, but I think I understand what you want (or close enough). I'll show you how I would do it with lapply and Map, and then we can do it with a for loop too.

First, get predictions on the testing data. All of these methods are exactly the same:

# lapply way
predictions = lapply(allModelsList, predict, newdata = testingdata)

# for loop way
predictions = list()
for (i in 1:length(allModelsList)) {
  predictions[[i]] = predict(allModelsList[[i]], newdata = testingdata)
}

# manual way - just so you understand exactly what's going on
predictions = list(
  predict(allModelsList[[1]], newdata = testingdata),
  predict(allModelsList[[2]], newdata = testingdata),
  predict(allModelsList[[3]], newdata = testingdata),
  ...
)

Now, predictions is a list, so we access each element with [[. The first one is predictions[[1]], the kth one is predictions[[k]] if we want to define some variable k (like to use in loop). We could also add descriptive names and use the names instead of the indices.

Similarly, we can calculate all the confusion matrices:

# lapply way
conf_matrices = lapply(predictions, confusionMatrix, reference = testingdata$outcome)

# for loop way
conf_matrices = list()
for (p in 1:length(predictions)) {
  conf_matrices[[p]] = confusionMatrix(p, reference = testingdata$outcome)
}

# manual way (for illustration)
conf_matrices = list(
  confusionMatrix(predictions[[1]], reference = testingdata$outcome),
  confusionMatrix(predictions[[2]], reference = testingdata$outcome),
  ...
) 

Again, we have a list. The first confusion matrix is conf_matrices[[1]] and all the same as above.

Hopefully that's helps us understand how to use lapply or a for loop to create a list.


Now, toward the bottom of your question you seem to imply that the Accuracy part of the confusion matrix. I ran the example at the bottom of the help page ?confusionMatrix and looked at the result. Running str(conf_mat) on a result showed me that it is a list, and that the "overall" element of the list is a named vector, including the "Accuracy". So, for an individual confusion matrix cm we can extract the accuracy with cm[["overall"]]["Accuracy"]. We use [[ for the list part and [ for the regular vector part. (We could also use cm$overall["Accuracy"]. $ works when we give it the exact name, no quotes, no variables. A lot of your issues seem to be related to trying to use $ with quotes or variables. You just can't do that. See fortunes::fortune(312)).

So, we can extract the accuracies from our confusion matrix list:

# I use *s*apply here so the result will be *s*implified into a vector
acc = sapply(conf_matrices, function(cm) cm[["overall"]]["Accuracy"])

acc = numeric(length(conf_matrices))
for (i in 1:length(conf_matrices)) {
  acc[i] = conf_matrices[[i]][["overall"]]["Accuracy"]
}

Or, if you know from the beginning you only want the accuracy, we could get there directly without saving the intermediate steps:

# apply
acc = sapply(allModelsList, function(x) {
    pred = predict(x, newdata = testingdata)
    cm = confusionMatrix(pred, reference = testingdata$outcome
    return(cm[["overall"]]["Accuracy"]
  }
)

# for loop acc = numeric(length(allModelsList)) for (i in 1:length(allModelsList)) { pred = predict(allModelsList[[i]], newdata = testingdata) cm = confusionMatrix(pred, reference = testingdata$outcome acc[i] = (cm[["overall"]]["Accuracy"] }


Notes: As mentioned above, without a reproducible example I'm guessing quite a bit and none of this is tested because I don't have any inputs to test on. I'm presuming that what I see in your question in terms of individual steps, like that we want to predict on each element of allModelResults, are correct. (If so, it seems like, say, fittedModels would be a much better name than allModelResults.) I don't know what you mean by "model specifications", and I have no idea what's in allModelList, but hopefully this gives you enough examples of working with lists that you can work out any kinks. (There may also be, say, mismatched parentheses or missing brackets.)

lapply and sapply are convenient for letting you do less typing than a for loop, but they're not really any different. They set up an object to hold the results, and they fill it up. If you want to create multiple results at the same time, you may want to just us a for loop. And as the number of steps inside gets longer, it can be easier to debug a for loop anyway. Use what you like and what makes sense to you.

like image 167
Gregor Thomas Avatar answered May 02 '26 04:05

Gregor Thomas