Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iteratively constructed dataframe in R

I'm relatively new to R, and was wondering the most efficient way to iteratively construct a dataframe (one row at a time, the number of iterations "n" and the length of each row "l" are known beforehand).

  1. Create empty dataframe, add a row each iteration
  2. Preallocate n x l dataframe, modify a row each iteration
  3. Preallocate n x l matrix, modify a row each iteration, make dataframe from matrix
  4. Something else
like image 742
daltonb Avatar asked Oct 27 '10 13:10

daltonb


People also ask

How does Rbind work in R?

rbind() function in R Language is used to combine specified Vector, Matrix or Data Frame by rows. deparse. level: This value determines how the column names generated. The default value of deparse.

Why not use for loops in R?

For loops are not as important in R as they are in other languages because R is a functional programming language. This means that it's possible to wrap up for loops in a function, and call that function instead of using the for loop directly.

How do I create an empty Dataframe with column names in R?

To create an empty data frame with column names , initialize a vector with column names first, c() is used to create a Vector in R. And then create DataFrame by using data. frame() function and assign this vector to columns(df) .

How do I create an empty Dataframe in R?

An empty data frame can also be created with or without specifying the column names and column types to the data values contained within it. data. frame() method can be used to create a data frame, and we can assign the column with the empty vectors.


2 Answers

Pre-allocate!!!

And use a matrix if the data are all the same type. It will be much faster than a data.frame.

For example:

> n <- 1000      # Number of rows
> row <- 1:20*1  # one row
> 
> # Adding row, one-by-one
> Data <- data.frame()
> system.time(for(i in 1:n) Data <- rbind(Data,row))
   user  system elapsed 
   2.18    0.00    2.18 
> 
> # Pre-allocated data.frame
> Data <- as.data.frame(Data)
> system.time(for(i in 1:n) Data[i,] <- row)
   user  system elapsed 
   0.94    0.00    0.93
>
> # Pre-allocated matrix (fast!)
> Data <- as.matrix(Data)
> system.time({ for(i in 1:n) Data[i,] <- row; Data <- as.data.frame(Data) })
   user  system elapsed 
      0       0       0 
like image 126
Joshua Ulrich Avatar answered Nov 07 '22 02:11

Joshua Ulrich


How about pre-allocating with whatever column types you need from a list first?

as.data.frame(list(a1 = vector("numeric", n), a2 = vector("character", n)))

like image 32
mdsumner Avatar answered Nov 07 '22 03:11

mdsumner