I have a function that returns two values in a list. Both values need to be added to a data.table in two new columns. Evaluation of the function is costly, so I would like to avoid having to compute the function twice. Here's the example:
library(data.table) example(data.table) DT x y v 1: a 1 42 2: a 3 42 3: a 6 42 4: b 1 4 5: b 3 5 6: b 6 6 7: c 1 7 8: c 3 8 9: c 6 9
Here's an example of my function. Remember I said it's costly compute, on top of that there is no way to deduce one return value from the other given values (as in the example below):
myfun <- function (y, v) { ret1 = y + v ret2 = y - v return(list(r1 = ret1, r2 = ret2)) }
Here's my way to add two columns in one statement. That one needs to call myfun twice, however:
DT[,new1:=myfun(y,v)$r1][,new2:=myfun(y,v)$r2] x y v new1 new2 1: a 1 42 43 -41 2: a 3 42 45 -39 3: a 6 42 48 -36 4: b 1 4 5 -3 5: b 3 5 8 -2 6: b 6 6 12 0 7: c 1 7 8 -6 8: c 3 8 11 -5 9: c 6 9 15 -3
Any suggestions on how to do this? I could save r2
in a separate environment each time I call myfun, I just need a way to add two columns by reference at a time.
Method 2: Using “.” and “by” In this method, we use the dot “.” with the “by”. Here “.” is used to put the data in the new columns and by is used to add those columns to the data table. So, they together are used to add columns to the table.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
To add or insert observation/row to an existing Data Frame in R, we use rbind() function. We can add single or multiple observations/rows to a Data Frame in R using rbind() function.
Since data.table
v1.8.3, you can do this:
DT[, c("new1","new2") := myfun(y,v)]
Another option is storing the output of the function and adding the columns one-by-one:
z <- myfun(DT$y,DT$v) head(DT[,new1:=z$r1][,new2:=z$r2]) # x y v new1 new2 # [1,] a 1 42 43 -41 # [2,] a 3 42 45 -39 # [3,] a 6 42 48 -36 # [4,] b 1 4 5 -3 # [5,] b 3 5 8 -2 # [6,] b 6 6 12 0
The answer can not be used such as when the function is not vectorized.
For example in the following situation it will not work as intended:
myfun <- function (y, v, g) { ret1 = y + v + length(g) ret2 = y - v + length(g) return(list(r1 = ret1, r2 = ret2)) } DT # v y g # 1: 1 1 1 # 2: 1 3 4,2 # 3: 1 6 9,8,6 DT[,c("new1","new2"):=myfun(y,v,g)] DT # v y g new1 new2 # 1: 1 1 1 5 3 # 2: 1 3 4,2 7 5 # 3: 1 6 9,8,6 10 8
It will always add the size of column g
, not the size of each vector in g
A solution in such case is:
DT[, c("new1","new2") := data.table(t(mapply(myfun,y,v,g)))] DT # v y g new1 new2 # 1: 1 1 1 3 1 # 2: 1 3 4,2 6 4 # 3: 1 6 9,8,6 10 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With