I have a big data frame taking about 900MB ram. Then I tried to modify it like this:
dataframe[[17]][37544]=0
It seems that makes R using more than 3G ram and R complains "Error: cannot allocate vector of size 3.0 Mb", ( I am on a 32bit machine.)
I found this way is better:
dataframe[37544, 17]=0
but R's footprint still doubled and the command takes quite some time to run.
From a C/C++ background, I am really confused about this behavior. I thought something like dataframe[37544, 17]=0
should be completed in a blink without costing any extra memory (only one cell should be modified). What is R doing for those commands I posted? What is the right way to modify some elements in a data frame then without doubling the memory footprint?
Thanks so much for your help!
Tao
The number is 2^31 - 1. This is the maximum number of rows for a data. frame, but it is so large you are far more likely to run out of memory for even single vectors before you start collecting several of them.
R uses more memory probably because of some copying of objects. Although these temporary copies get deleted, R still occupies the space. To give this memory back to the OS you can call the gc function. However, when the memory is needed, gc is called automatically.
Windows users may get the error that R has run out of memory. If you have R already installed and subsequently install more RAM, you may have to reinstall R in order to take advantage of the additional capacity.
R is memory intensive, so it's best to get as much RAM as possible. If you use virtual machines you might have restrictions on how much memory you can allocate to a single instance. In that case we recommend getting as much memory as possible and consider using multiple nodes. Minimum (2 core / 4G).
Following up on Joran suggesting data.table
, here are some links. Your object, at 900MB, is manageable in RAM even in 32bit R, with no copies at all.
When should I use the :=
operator in data.table?
Why has data.table defined :=
rather than overloading <-?
Also, data.table
v1.8.0 (not yet on CRAN but stable on R-Forge) has a set()
function which provides even faster assignment to elements, as fast as assignment to matrix
(appropriate for use inside loops for example). See latest NEWS for more details and example. Also see ?":="
which is linked from ?data.table
.
And, here are 12 questions on Stack Overflow with the data.table
tag containing the word "reference".
For completeness :
require(data.table)
DT = as.data.table(dataframe)
# say column name 17 is 'Q' (i.e. LETTERS[17])
# then any of the following :
DT[37544, Q:=0] # using column name (often preferred)
DT[37544, 17:=0, with=FALSE] # using column number
col = "Q"
DT[37544, col:=0, with=FALSE] # variable holding name
col = 17
DT[37544, col:=0, with=FALSE] # variable holding number
set(DT,37544L,17L,0) # using set(i,j,value) in v1.8.0
set(DT,37544L,"Q",0)
But, please do see linked questions and the package's documentation to see how :=
is more general than this simple example; e.g., combining :=
with binary search in an i
join.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With