Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R put multiple randomForest objects into a vector

I am curious if R has the ability to place objects into vectors/lists/arrays/etc. I am using the randomforest package to work on subsets of a larger piece of data and would like to store each version in a list. It would be similar to this:

answers <- c()
for(i in 1:10){
x <- round((1/i), 3)
answers <- (rbind(answers, x))
}

Ideally I'd like to do something like this:

answers <- c()
for(i in 1:10){
RF <- randomForest(training, training$data1, sampsize=c(100), do.trace=TRUE, importance=TRUE, ntree=50,,forest=TRUE)
answers <- (rbind(answers, RF))
}

This kind of works but here's the output for a single RF object:

> RF 

Call:
 randomForest(x = training, y = training$data1, ntree = 50, sampsize = c(100), importance = TRUE, do.trace = TRUE,      forest = TRUE) 
               Type of random forest: regression
                     Number of trees: 10
No. of variables tried at each split: 2

          Mean of squared residuals: 0.05343956
                    % Var explained: 14.32

While this is the out put for the 'answers' list:

> answers 
   call       type         predicted      mse        rsq        oob.times      importance importanceSD
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
   localImportance proximity ntree mtry forest  coefs y              test inbag
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 

Does anyone know how to store all the RF objects or call them so that the info stored is the same as a single RF object? Thanks for suggestions.

like image 525
screechOwl Avatar asked Oct 19 '11 02:10

screechOwl


3 Answers

Don't grow vectors or lists one element at a time. Pre-allocate them and assign objects to specific parts:

answers <- vector("list",10)
for (i in 1:10){
    answers[[i]] <- randomForest(training, training$data1, sampsize=c(100), 
                                 do.trace=TRUE, importance=TRUE, ntree=50,
                                 forest=TRUE)
}

As a side note, rbinding vectors doesn't create another vector or list; if you check your output in your first example you'll see that it is a matrix with one column. That explains the strange behavior you observe when trying to rbind randomForest objects together.

like image 166
joran Avatar answered Nov 09 '22 09:11

joran


Use lapply:

lapply(1:10,function(i) randomForest(<your parameters>))

You will get a list of random forest objects; you can then access i-th of them using [[]] operator.

like image 30
mbq Avatar answered Nov 09 '22 09:11

mbq


Initialize a list with:

mylist <- vector("list")  # technically all objects in R are vectors

Add to it with:

new_element <- 5
mylist <- c(mylist, new_element)

@joran's advice about pre-allocation is pertinent when the lists are large, but not entirely necessary when they are small. You could also have access the matrix you build in your original code. It looks a bit strange but the information is all in there. For example the first element of that matrix of lists could have been recovered with:

answers[1, ]
like image 3
IRTFM Avatar answered Nov 09 '22 07:11

IRTFM