Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table 1.8.1.: "DT1 = DT2" is not the same as DT1 = copy(DT2)?

Tags:

r

data.table

I've noticed some inconsistent (inconsistent to me) behaviour in data.table when using different assignment operators. I have to admit I never quite got the difference between "=" and copy(), so maybe we can shed some light here. If you use "=" or "<-" instead of copy() below, upon changing the copied data.table, the original data.table will change as well.

Please execute the following commands and you will see what I mean

library(data.table)
example(data.table)

DT
   x y  v
1: a 1 42
2: a 3 42
3: a 6 42
4: b 1  4
5: b 3  5
6: b 6  6
7: c 1  7
8: c 3  8
9: c 6  9

DT2 = DT

now i'll change the v column of DT2:

DT2[ ,v:=3L]
   x y  v
1: a 1  3
2: a 3  3
3: a 6  3
4: b 1  3
5: b 3  3
6: b 6  3
7: c 1  3
8: c 3  3
9: c 6  3

but look what happened to DT:

DT
   x y  v
1: a 1  3
2: a 3  3
3: a 6  3
4: b 1  3
5: b 3  3
6: b 6  3
7: c 1  3
8: c 3  3
9: c 6  3

it changed as well. so: changing DT2 changed the original DT. not so if I use copy():

example(data.table)  # reset DT
DT3 <- copy(DT)
DT3[, v:= 3L]
   x y  v
1: a 1  3
2: a 3  3
3: a 6  3
4: b 1  3
5: b 3  3
6: b 6  3
7: c 1  3
8: c 3  3
9: c 6  3

DT
   x y  v
1: a 1 42
2: a 3 42
3: a 6 42
4: b 1  4
5: b 3  5
6: b 6  6
7: c 1  7
8: c 3  8
9: c 6  9

is this behaviour expected?

like image 476
Florian Oswald Avatar asked Jun 25 '12 15:06

Florian Oswald


1 Answers

Yes. This is expected behaviour, and well documented.

Since data.table uses references to the original object to achieve modify-in-place, it is very fast.

For this reason, if you really want to copy the data, you need to use copy(DT)


From the documentation for ?copy:

The data.table is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., setkey(DT,a)[J("foo")]. If you require a copy, take a copy first (using DT2=copy(DT)). copy() may also sometimes be useful before := is used to subassign to a column by reference. See ?copy.

See also this question : Understanding exactly when a data.table is a reference to vs a copy of another

like image 126
Andrie Avatar answered Oct 13 '22 12:10

Andrie