Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

copy a list of data.tables

Tags:

r

data.table

I have the following situation:

1) a list of data tables

2) For testing purposes I deliberately want to (deeply) copy the whole list including the data tables

3) I want to take some element from the copied list and add a new column.

Here is the code:

library(data.table)
x = data.table(aaa = c(1,2))
y = data.table(bbb = c(1,2))
z = list(x,y)
zz = copy(z)

v = zz[[1]]
v = v[, newColumn := 1]

Now Im getting the following error:

Error in `[.data.table`(res, , `:=`(xxx, TRUE)) : 
(converted from warning) Invalid .internal.selfref detected and fixed
by taking a copy of the whole table so that := can add this new column 
by reference. At an earlier point, this data.table has been copied by R 
(or been created manually using structure() or similar). Avoid key<-, 
names<- and attr<- which in R currently (and oddly) may copy the whole 
data.table. Use set* syntax instead to avoid copying: ?set, ?setnames 
and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 
and DT2 (R's list() used to copy named objects); please upgrade to 
R>v3.0.2 if that is biting. If this message doesn't help, please report 
to datatable-help so the root cause can be fixed.

I dont understand precisely how the copy calls are handled by R and how they are passed to data.table but isnt it like so: (?)

If someone explicitely uses the copy function then he/she is aware of the fact that there exists a difference between 'by-value' and 'by-reference'. So he/she should be handed out the true copy of the object.

Hence, I am of the opinion that there should not be any error and I consider it as a 'bug' that the error occurs nonetheless. Is that correct?

FW

like image 466
Fabian Werner Avatar asked Jun 22 '15 10:06

Fabian Werner


1 Answers

copy() is for copying data.table's. You are using it to copy a list. Try..

zz <- lapply(z,copy)
zz[[1]][ , newColumn := 1 ]

Using your original code, you will see that applying copy() to the list does not make a copy of the original data.table. They are still referenced by the same location in memory:

library(data.table)
x = data.table(aaa = c(1,2))
y = data.table(bbb = c(1,2))
z = list(x,y)
zz = copy(z)

#  Both zz$x and z$x are the same object:
.Internal(inspect(zz$x))
#  @7fd58a079778 00 NILSXP g1c0 [MARK,NAM(2)] 
.Internal(inspect(z$x))
#  @7fd58a079778 00 NILSXP g1c0 [MARK,NAM(2)] 
like image 91
Simon O'Hanlon Avatar answered Nov 14 '22 23:11

Simon O'Hanlon