Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory error while using write.csv

Tags:

r

I've a 9 column data.frame (x) and it has millions of rows. I was able to read it into R, successfully do some modifications on it and the code would execute without a problem. However, when I try to write it out to a .csv file using

write.csv(x,file=argv[2],quote=F,row.names=F)

I get an error which says

Error: cannot allocate vector of size 1.2Gb

This makes no sense as the data is already in memory, the computations done, and all I want to do is write it out to disk. Also, while I monitored the memory, the virtual memory size almost doubled for this process during this write phase. Would writing a custom C function to write out this data.frame help? Any suggestions/help/pointers appreciated.

ps: I'm running all this in a 64 bit ubuntu box with about 24G RAM. Overall space may not be an issue. The data size is about 10G

like image 537
broccoli Avatar asked Jan 17 '23 13:01

broccoli


1 Answers

You have to understand that R functions will often copy arguments, if they modify them, as the functional programming paradigm employed by R decrees that functions don't change the objects passed in as arguments; so R copies them when changes need to be made in the course of executing a function.

If you build R with memory tracing support you can see this copying in action for any operation you are having trouble with. Using the airquality example data set, tracing memory use I see

> head(airquality)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
> tracemem(airquality)
[1] "<0x12b4f78>"
> write.csv(airquality, "airquality.csv")
tracemem[0x12b4f78 -> 0x1aac0d8]: as.list.data.frame as.list lapply unlist which write.table eval eval eval.parent write.csv 
tracemem[0x12b4f78 -> 0x1aabf20]: as.list.data.frame as.list lapply sapply write.table eval eval eval.parent write.csv 
tracemem[0x12b4f78 -> 0xf8ae08]: as.list.data.frame as.list lapply write.table eval eval eval.parent write.csv 
tracemem[0x12b4f78 -> 0xf8aca8]: write.table eval eval eval.parent write.csv 
tracemem[0xf8aca8 -> 0xca7fe0]: [<-.data.frame [<- write.table eval eval eval.parent write.csv 
tracemem[0xca7fe0 -> 0xcaac50]: [<-.data.frame [<- write.table eval eval eval.parent write.csv

So that indicates 6 copies of the data are being made as R prepares it for writing to file.

Clearly that is eating up the 24Gb of RAM you have available; the error says that R needs another 1.2Gb of RAM to complete an operation.

The simplest solution to start with would be to write the file in chunks. Write the first set of lines of data out using append = FALSE, then use append = TRUE for subsequent calls to write.csv() writing out the remaining chunks. You may need to play around with this to find an chunk size that will not exceed the available memory.

like image 188
Gavin Simpson Avatar answered Jan 30 '23 19:01

Gavin Simpson