Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysterious problems appending data frames with rbind

I am in a dilly of a pickle trying to join several files together into a master file. There are 5 files with same structure, and I can read each file individually into a data frame with no problems. I even manually set the column class for 200+ variables rather than letting R decide, because I believed that was causing the problem. However, appending any two files together causes me to run out of memory.

Warning messages: 1: In rbind(deparse.level, ...) : Reached total allocation of 4043Mb: see help(memory.size)

So I did some experimenting: I joined two different chunks of file 1 together. That works. I joined a chunk of file 2 to a chunk of file 1. That works. I joined a chunk of file 2 to the original file 1. That works.

Each of these files comes in at a little under 200MB so I am not sure that I should be running out of memory. If anybody is interested, the data comes from hearstchallenge.com. The competition is long over, we are just using the data for an analysis experiment (and not programming!).

Any suggestions for how to solve this?

like image 544
Oliver Avatar asked Nov 12 '22 13:11

Oliver


1 Answers

I have run into similar problems. The solution is not to use rbind() or cbind() on large data. They tend to leak memory.

To solve your problem using only R, first create a dataframe of the dimensions that the dataframe would have after you put the pieces together. Then use assignments to fill in the large dataframe.

like image 110
Christopher Louden Avatar answered Nov 15 '22 06:11

Christopher Louden