I've noticed some inconsistent (inconsistent to me) behaviour in data.table when using different assignment operators. I have to admit I never quite got the difference between "=" and copy(), so maybe we can shed some light here. If you use "=" or "<-" instead of copy() below, upon changing the copied data.table, the original data.table will change as well.
Please execute the following commands and you will see what I mean
library(data.table)
example(data.table)
DT
x y v
1: a 1 42
2: a 3 42
3: a 6 42
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
DT2 = DT
now i'll change the v column of DT2:
DT2[ ,v:=3L]
x y v
1: a 1 3
2: a 3 3
3: a 6 3
4: b 1 3
5: b 3 3
6: b 6 3
7: c 1 3
8: c 3 3
9: c 6 3
but look what happened to DT:
DT
x y v
1: a 1 3
2: a 3 3
3: a 6 3
4: b 1 3
5: b 3 3
6: b 6 3
7: c 1 3
8: c 3 3
9: c 6 3
it changed as well. so: changing DT2 changed the original DT. not so if I use copy():
example(data.table) # reset DT
DT3 <- copy(DT)
DT3[, v:= 3L]
x y v
1: a 1 3
2: a 3 3
3: a 6 3
4: b 1 3
5: b 3 3
6: b 6 3
7: c 1 3
8: c 3 3
9: c 6 3
DT
x y v
1: a 1 42
2: a 3 42
3: a 6 42
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
is this behaviour expected?
Yes. This is expected behaviour, and well documented.
Since data.table
uses references to the original object to achieve modify-in-place, it is very fast.
For this reason, if you really want to copy the data, you need to use copy(DT)
From the documentation for ?copy
:
The data.table is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g.,
setkey(DT,a)[J("foo")]
. If you require a copy, take a copy first (usingDT2=copy(DT)
).copy()
may also sometimes be useful before:=
is used to subassign to a column by reference. See?copy
.
See also this question : Understanding exactly when a data.table is a reference to vs a copy of another
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With