I would like to modify a data.table within a function. If I use the := feature within the function, the result is only printed for the second call.
Look at the following illustration:
library(data.table)
mydt <- data.table(x = 1:3, y = 5:7)
myfunction <- function(dt) {
dt[, z := y - x]
dt
}
When I call only the function, the table is not printed (which is the standard behaviour. However, if I save the returned data.table into a new object, it is not printed at the first call, only for the second one.
myfunction(mydt) # nothing is printed
result <- myfunction(mydt)
result # nothing is printed
result # for the second time, the result is printed
mydt
# x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4
Could you explain why this happens and how to prevent it?
As David Arenburg mentions in a comment, the answer can be found here. There was a bug fixed in the version 1.9.6 but the fix introduced this downside.
One should call DT[] at the end of the function to prevent this behaviour.
myfunction <- function(dt) {
dt[, z := y - x][]
}
myfunction(mydt) # prints immediately
# x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4
This is described in data.table FAQ 2.23:
Why do I have to type
DTsometimes twice after using:=to print the result to console?
This is an unfortunate downside to get #869 to work. If a
:=is used inside a function with noDT[]before the end of the function, then the next timeDTis typed at the prompt, nothing will be printed. A repeatedDTwill print. To avoid this: include aDT[]after the last:=in your function. If that is not possible (e.g., it's not a function you can change) thenprint(DT)andDT[]at the prompt are guaranteed to print. As before, adding an extra[]on the end of:=query is a recommended idiom to update and then print; e.g.>DT[,foo:=3L][].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With