I want to add many new columns simultaneously to a data.table
based on by-group computations. A working example of my data would look something like this:
Time Stock x1 x2 x3
1: 2014-08-22 A 15 27 34
2: 2014-08-23 A 39 44 29
3: 2014-08-24 A 20 50 5
4: 2014-08-22 B 42 22 43
5: 2014-08-23 B 44 45 12
6: 2014-08-24 B 3 21 2
Now I want to scale
and sum
many of the variables to get an output like:
Time Stock x1 x2 x3 x2_scale x3_scale x2_sum x3_sum
1: 2014-08-22 A 15 27 34 -1.1175975 0.7310560 121 68
2: 2014-08-23 A 39 44 29 0.3073393 0.4085313 121 68
3: 2014-08-24 A 20 50 5 0.8102582 -1.1395873 121 68
4: 2014-08-22 B 42 22 43 -0.5401315 1.1226726 88 57
5: 2014-08-23 B 44 45 12 1.1539172 -0.3274462 88 57
6: 2014-08-24 B 3 21 2 -0.6137858 -0.7952265 88 57
A brute force implementation of my problem would be:
library(data.table)
set.seed(123)
d <- data.table(Time = rep(seq.Date( Sys.Date(), length=3, by="day" )),
Stock = rep(LETTERS[1:2], each=3 ),
x1 = sample(1:50, 6),
x2 = sample(1:50, 6),
x3 = sample(1:50, 6))
d[,x2_scale:=scale(x2),by=Stock]
d[,x3_scale:=scale(x3),by=Stock]
d[,x2_sum:=sum(x2),by=Stock]
d[,x3_sum:=sum(x3),by=Stock]
Other posts describing a similar issue (Add multiple columns to R data.table in one function call? and Assign multiple columns using := in data.table, by group) suggest the following solution:
d[, c("x2_scale","x3_scale"):=list(scale(x2),scale(x3)), by=Stock]
d[, c("x2_sum","x3_sum"):=list(sum(x2),sum(x3)), by=Stock]
But again, this would get very messy with a lot of variables and also this brings up an error message with scale
(but not with sum
since this isn't returning a vector).
Is there a more efficient way to achieve the required result (keeping in mind that my actual data set is quite large)?
In the data table structure, click the plus sign between the nodes where you want to add columns, and select Add columns.
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
After you create a DataTable and define its structure using columns and constraints, you can add new rows of data to the table. To add a new row, declare a new variable as type DataRow. A new DataRow object is returned when you call the NewRow method.
I think with a small modification to your last code you can easily do both for as many variables you want
vars <- c("x2", "x3") # <- Choose the variable you want to operate on
d[, paste0(vars, "_", "scale") := lapply(.SD, function(x) scale(x)[, 1]), .SDcols = vars, by = Stock]
d[, paste0(vars, "_", "sum") := lapply(.SD, sum), .SDcols = vars, by = Stock]
## Time Stock x1 x2 x3 x2_scale x3_scale x2_sum x3_sum
## 1: 2014-08-22 A 13 14 32 -1.1338934 1.1323092 87 44
## 2: 2014-08-23 A 25 39 9 0.7559289 -0.3701780 87 44
## 3: 2014-08-24 A 18 34 3 0.3779645 -0.7621312 87 44
## 4: 2014-08-22 B 44 8 6 -0.4730162 -0.7258662 59 32
## 5: 2014-08-23 B 49 3 18 -0.6757374 1.1406469 59 32
## 6: 2014-08-24 B 15 48 8 1.1487535 -0.4147807 59 32
For simple functions (that don't need special treatment like scale
) you could easily do something like
vars <- c("x2", "x3") # <- Define the variable you want to operate on
funs <- c("min", "max", "mean", "sum") # <- define your function
for(i in funs){
d[, paste0(vars, "_", i) := lapply(.SD, eval(i)), .SDcols = vars, by = Stock]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With