Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table Size and Memory Limits

Tags:

r

data.table

I have a 15.4GB R data.table object with 29 Million records and 135 variables. My system & R info are as follows:

Windows 7 x64 on a x86_64 machine with 16GB RAM."R version 3.1.1 (2014-07-10)" on "x86_64-w64-mingw32" 

I get the following memory allocation error (see image)

enter image description here

I set my memory limits as follows:

#memory.limit(size=7000000)
#Change memory.limit to 40GB when using ff library
memory.limit(size=40000)

My questions are the following:

  1. Should I change the memory limit to 7 TB
  2. Break the file into chunks and do the process?
  3. Any other suggestions?
like image 438
Krishnan Avatar asked Jan 29 '15 20:01

Krishnan


People also ask

What is the maximum data size R can handle?

Under most 64-bit versions of Windows the limit for a 32-bit build of R is 4Gb: for the oldest ones it is 2Gb. The limit for a 64-bit build of R (imposed by the OS) is 8Tb.

How big can a dataframe be in R?

The number is 2^31 - 1. This is the maximum number of rows for a data. frame, but it is so large you are far more likely to run out of memory for even single vectors before you start collecting several of them.

How do I check my memory limit in R?

Determining Your Memory Limits in R Two calls, memory. limit() and memory. size() return the amount of RAM in your CPU, and how much is being used by your current R session, respectively.

Does data table use less memory?

Memory Usage (Efficiency) data. table is the most efficient when filtering rows. dplyr is far more efficient when summarizing by group while data. table was the least efficient.


1 Answers

Try to profile your code to identify which statements cause the "waste of RAM":

# install.packages("pryr")
library(pryr) # for memory debugging

memory.size(max = TRUE) # print max memory used so far (works only with MS Windows!)
mem_used()
gc(verbose=TRUE) # show internal memory stuff (see help for more)

# start profiling your code
Rprof( pfile <- "rprof.log", memory.profiling=TRUE) # uncomment to profile the memory consumption

# !!! Your code goes here

# Print memory statistics within your code whereever you think it is sensible
memory.size(max = TRUE)
mem_used()
gc(verbose=TRUE)

# stop profiling your code
Rprof(NULL)
summaryRprof(pfile,memory="both") # show the memory consumption profile

Then evaluate the memory consumption profile...

Since your code stops with an "out of memory" exception you should reduce the input data to an amount the makes your code workable and use this input for memory profiling...

like image 143
R Yoda Avatar answered Sep 29 '22 11:09

R Yoda