I've a 9 column data.frame (x) and it has millions of rows. I was able to read it into R, successfully do some modifications on it and the code would execute without a problem. However, when I try to write it out to a .csv file using
write.csv(x,file=argv[2],quote=F,row.names=F)
I get an error which says
Error: cannot allocate vector of size 1.2Gb
This makes no sense as the data is already in memory, the computations done, and all I want to do is write it out to disk. Also, while I monitored the memory, the virtual memory size almost doubled for this process during this write phase. Would writing a custom C function to write out this data.frame help? Any suggestions/help/pointers appreciated.
ps: I'm running all this in a 64 bit ubuntu box with about 24G RAM. Overall space may not be an issue. The data size is about 10G
You have to understand that R functions will often copy arguments, if they modify them, as the functional programming paradigm employed by R decrees that functions don't change the objects passed in as arguments; so R copies them when changes need to be made in the course of executing a function.
If you build R with memory tracing support you can see this copying in action for any operation you are having trouble with. Using the airquality
example data set, tracing memory use I see
> head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
> tracemem(airquality)
[1] "<0x12b4f78>"
> write.csv(airquality, "airquality.csv")
tracemem[0x12b4f78 -> 0x1aac0d8]: as.list.data.frame as.list lapply unlist which write.table eval eval eval.parent write.csv
tracemem[0x12b4f78 -> 0x1aabf20]: as.list.data.frame as.list lapply sapply write.table eval eval eval.parent write.csv
tracemem[0x12b4f78 -> 0xf8ae08]: as.list.data.frame as.list lapply write.table eval eval eval.parent write.csv
tracemem[0x12b4f78 -> 0xf8aca8]: write.table eval eval eval.parent write.csv
tracemem[0xf8aca8 -> 0xca7fe0]: [<-.data.frame [<- write.table eval eval eval.parent write.csv
tracemem[0xca7fe0 -> 0xcaac50]: [<-.data.frame [<- write.table eval eval eval.parent write.csv
So that indicates 6 copies of the data are being made as R prepares it for writing to file.
Clearly that is eating up the 24Gb of RAM you have available; the error says that R needs another 1.2Gb of RAM to complete an operation.
The simplest solution to start with would be to write the file in chunks. Write the first set of lines of data out using append = FALSE
, then use append = TRUE
for subsequent calls to write.csv()
writing out the remaining chunks. You may need to play around with this to find an chunk size that will not exceed the available memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With