Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

writing to a dataframe from a for-loop in R

I'm trying to write from a loop to a data frame in R, for example a loop like this>

for (i in 1:20) {
print(c(i+i,i*i,i/1))}

and to write each line of 3 values to a data frame with three columns, so that each iteration takes on a new row. I've tried using matrix, with ncol=3 and filled by rows, but only get the last item from the loop.

Thanks.

like image 201
CCID Avatar asked Apr 01 '10 21:04

CCID


4 Answers

You could use rbind:

d <- data.frame()
for (i in 1:20) {d <- rbind(d,c(i+i, i*i, i/1))}
like image 136
Karsten W. Avatar answered Nov 08 '22 09:11

Karsten W.


Another way would be

do.call("rbind", sapply(1:20, FUN = function(i) c(i+i,i*i,i/1), simplify = FALSE))


     [,1] [,2] [,3]
 [1,]    2    1    1
 [2,]    4    4    2
 [3,]    6    9    3
 [4,]    8   16    4
 [5,]   10   25    5
 [6,]   12   36    6

If you don't specify simplify = FALSE, you have to transpose the result using t. This can be tedious for large structures.

This solution is especially handy if you have a data set on the large side and/or you need to repeat this many many times.

I offer some timings of solutions in this "thread".

> system.time(do.call("rbind", sapply(1:20000, FUN = function(i) c(i+i,i*i,i/1), simplify = FALSE)))
   user  system elapsed 
   0.05    0.00    0.05 

> system.time(ldply(1:20000, function(i)c(i+i, i*i, i/1)))
   user  system elapsed 
   0.14    0.00    0.14 

> system.time({d <- matrix(nrow=20000, ncol=3) 
+ for (i in 1:20000) { d[i,] <- c(i+i, i*i, i/1)}})
   user  system elapsed 
   0.10    0.00    0.09 

> system.time(ldply(1:20000, function(i)c(i+i, i*i, i/1)))
   user  system elapsed 
  62.88    0.00   62.99 
like image 34
Roman Luštrik Avatar answered Nov 08 '22 10:11

Roman Luštrik


If all your values have the same type and you know the number of rows, you can use a matrix in the following way (this will be very fast):

d <- matrix(nrow=20, ncol=3) 
for (i in 1:20) { d[i,] <- c(i+i, i*i, i/1)}

If you need a data frame, you can use rbind (as another answer suggests), or functions from package plyr like this:

library(plyr)
ldply(1:20, function(i)c(i+i, i*i, i/1))
like image 27
cafe876 Avatar answered Nov 08 '22 11:11

cafe876


For loops have side-effects, so the usual way of doing this is to create an empty dataframe before the loop and then add to it on each iteration. You can instantiate it to the correct size and then assign your values to the i'th row on each iteration, or else add to it and reassign the whole thing using rbind().

The former approach will have better performance for large datasets.

like image 6
Shane Avatar answered Nov 08 '22 10:11

Shane