Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add multiple columns to R data.table in one function call?

Tags:

r

data.table

I have a function that returns two values in a list. Both values need to be added to a data.table in two new columns. Evaluation of the function is costly, so I would like to avoid having to compute the function twice. Here's the example:

library(data.table) example(data.table) DT    x y  v 1: a 1 42 2: a 3 42 3: a 6 42 4: b 1  4 5: b 3  5 6: b 6  6 7: c 1  7 8: c 3  8 9: c 6  9 

Here's an example of my function. Remember I said it's costly compute, on top of that there is no way to deduce one return value from the other given values (as in the example below):

myfun <- function (y, v)  { ret1 = y + v ret2 = y - v return(list(r1 = ret1, r2 = ret2)) } 

Here's my way to add two columns in one statement. That one needs to call myfun twice, however:

DT[,new1:=myfun(y,v)$r1][,new2:=myfun(y,v)$r2]     x y  v new1 new2 1: a 1 42   43  -41 2: a 3 42   45  -39 3: a 6 42   48  -36 4: b 1  4    5   -3 5: b 3  5    8   -2 6: b 6  6   12    0 7: c 1  7    8   -6 8: c 3  8   11   -5 9: c 6  9   15   -3 

Any suggestions on how to do this? I could save r2 in a separate environment each time I call myfun, I just need a way to add two columns by reference at a time.

like image 388
Florian Oswald Avatar asked Jul 03 '12 10:07

Florian Oswald


People also ask

How do I add multiple columns to a table in R?

Method 2: Using “.” and “by” In this method, we use the dot “.” with the “by”. Here “.” is used to put the data in the new columns and by is used to add those columns to the data table. So, they together are used to add columns to the table.

How do you call multiple columns in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I add data to a table in R?

To add or insert observation/row to an existing Data Frame in R, we use rbind() function. We can add single or multiple observations/rows to a Data Frame in R using rbind() function.


2 Answers

Since data.table v1.8.3, you can do this:

DT[, c("new1","new2") := myfun(y,v)] 

Another option is storing the output of the function and adding the columns one-by-one:

z <- myfun(DT$y,DT$v) head(DT[,new1:=z$r1][,new2:=z$r2]) #      x y  v new1 new2 # [1,] a 1 42   43  -41 # [2,] a 3 42   45  -39 # [3,] a 6 42   48  -36 # [4,] b 1  4    5   -3 # [5,] b 3  5    8   -2 # [6,] b 6  6   12    0 
like image 52
flodel Avatar answered Oct 16 '22 10:10

flodel


The answer can not be used such as when the function is not vectorized.

For example in the following situation it will not work as intended:

myfun <- function (y, v, g)  {   ret1 = y + v + length(g)   ret2 = y - v + length(g)   return(list(r1 = ret1, r2 = ret2)) } DT #    v y                  g # 1: 1 1                  1 # 2: 1 3                4,2 # 3: 1 6              9,8,6  DT[,c("new1","new2"):=myfun(y,v,g)] DT #    v y     g new1 new2 # 1: 1 1     1    5    3 # 2: 1 3   4,2    7    5 # 3: 1 6 9,8,6   10    8 

It will always add the size of column g, not the size of each vector in g

A solution in such case is:

DT[, c("new1","new2") := data.table(t(mapply(myfun,y,v,g)))] DT #    v y     g new1 new2 # 1: 1 1     1    3    1 # 2: 1 3   4,2    6    4 # 3: 1 6 9,8,6   10    8 
like image 34
Vasco Avatar answered Oct 16 '22 08:10

Vasco