My problem resides in simple calculations over big data sets (around 25 millions rows and 10 columns, i.e. aroung 1GB data). My system is:
32bits/Windows7/4Gb Ram/R Studio 0.96, R 2.15.2
I can refer my database using BigMemory package. And use functions over my db. Also i am able to do it with ff package, filehash, etc.
The problem is while computing simple calculations (as unique values, means, etc.) i have the typical problem of
"cannot allocate vector size n mb"
, where n can be as small as 70mb - 95 mb, etc.
I know about all (i think) the solutions provided until now about this:
increase RAM.
launch R with inline code "--max-mem-size XXXX",
use memory.limit() and memory-size() commands,
use rm() and gc(),
work on 64bit,
close other programs, free memory, reboot,
use packages bigmemory, ff, filehash, sql, etc etc.
improve your data, use integers, shorts, etc. ...
check memory usage of intermediate calculations, ...
etc.
All of this is tested, done,(except moving to another system/machine, obiously) etc.
But I still get those "cannot allocate vector size n mb", where n is around 90mb for example, with really almost no memory usage from R or other programs, all of it rebooted, fresh.... I am aware of the difference between free memory and allocation from windows and R, etc, but,
It make no sense, because the memory avaiable is more than 3GB. I suspect the cause is something really under windows32b -- R memory managment, but it seems almost a joke to buy 4GB of RAM or switch all the system to 64bits, to allocate 70mb.
Is there something I am missing?
Determining Your Memory Limits in R Two calls, memory. limit() and memory. size() return the amount of RAM in your CPU, and how much is being used by your current R session, respectively.
Navigate to this directory C:\Program Files\RStudio\bin then start rstudio.exe using cd . You may need to adapt this depending on where your RStudio folder is located on your computer. Then write --max-mem-size=4GB and press enter. You will need to repeat this every time you want to start an R session.
You can force R to perform this check, and free the memory right away, by running the gc() command in R or going to Tools -> Memory -> Free Unused R Memory.
The problem is that R tries to allocate 90mb of continuous space. Unfortunately, after many operations, it is possible that the memory is too fragmented.
If possible, try to optimize your code to use small chunks of data at a time.
If you're trying to perform simple calculations like the ones you mentioned (eg. means, max of row, etc), you might try to use biganalytics
, which allows you to do a number of operations on big.matrix
objects.
Otherwise, as far as I know, short of switching to 64-bit OS and 64-bit R there's not much to do.
look at the ff package in CRAN. It "tricks" R by allocating data to a memory slot on a fixed file instead of using RAM. It works rather well with importing data. You can also use the ffbase package to perform simple, efficient calculations on the ff objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With