Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I prevent rbind() from geting really slow as dataframe grows larger?

I have a dataframe with only 1 row. To this I start to add rows by using rbind

df #mydataframe with only one row
for (i in 1:20000)
{
    df<- rbind(df, newrow)

}

this gets very slow as i grows. Why is that? and how can I make this type of code faster?

like image 766
Mark Avatar asked Feb 04 '13 19:02

Mark


1 Answers

You are in the 2nd circle of hell, namely failing to pre-allocate data structures.

Growing objects in this fashion is a Very Very Bad Thing in R. Either pre-allocate and insert:

df <- data.frame(x = rep(NA,20000),y = rep(NA,20000))

or restructure your code to avoid this sort of incremental addition of rows. As discussed at the link I cite, the reason for the slowness is that each time you add a row, R needs to find a new contiguous block of memory to fit the data frame in. Lots 'o copying.

like image 178
joran Avatar answered Oct 12 '22 15:10

joran