data.table: preallocating memory for future columns

Question

We have a very large data.table, to which we append columns, mainly by data.table.merge. Occasionally, this triggers a "Cannot allocate vector of size xx Gb" error, even though we know that there is this amount of memory available on the system.

Our suspicion is that this is due to the fact that this memory isn't part of a contiguous block, so we would like to somehow preallocate a larger chunk of RAM when creating the data.table.

One obvious suggestion is to just create all the columns that will be eventually merged into our data.table from another one at the outset. However, this isn't necessarily going to work, because merge is designed not to overwrite the columns of the DT1 with those of DT2 having the same name, but to rename them such that both can be kept.

Is there anything else that can be done?

Minimal example:

x = data.table(a = 1:10, b=2:11)
y = data.table(a = 1:10, c=2:11)

# want this to happen in the most memory-efficient way possible 
# and ideally without allocating new memory at all 
# (i.e., want to be able to pre-allocate enough memory in x 
# in line 1 to be able to do this)
x = merge(x, y, by=a)

jangorecki · Accepted Answer

Addressing the question from the code block: "want this to happen in the most memory-efficient way possible".
The most memory-efficient you can get is to add columns to your x dataset by reference while doing join.

Since the recent devel version of data.table, v1.9.5 you don't have to setkey before join.

library(data.table)
x = data.table(a = 1:10, b=2:11)
y = data.table(a = 1:10, c=2:11)
x[y, c := i.c, on="a"]

If you don't have the recent data.table version you have to setkehy in advance.

library(data.table)
x = data.table(a = 1:10, b=2:11, key="a")
y = data.table(a = 1:10, c=2:11, key="a")
x[y, c := i.c]

data.table: preallocating memory for future columns

Tags:

r

data.table

msp

1 Answers

jangorecki

Recent Activity

Donate For Us

data.table: preallocating memory for future columns

Tags:

r

data.table

msp

1 Answers

jangorecki

Related questions

Recent Activity

Donate For Us