I have a data table and I am trying to create a new variable that is a function of all the other columns. A simplified example would be if I simply wanted to sum or take an average across all the rows. For example:
dt <- data.table(a = 1:9, b = seq(10,90,10), c = seq(11:19), d = seq(100, 900, 100))
I want to create a vector/column that is simply the average of all the columns. The syntax that I think of would look something like this:
dt[, average := mean(.SD)]
However, this sums the whole thing. I know I can also do:
dt[, average := lapply(.SD, mean)]
But this gives a single row result. I'm essentially looking for the equivalent of:
dt[, average := lapply(.SD, mean), by = all]
such that it simply calculates this for all the rows, without having to create an "id" column and doing all of my calculating by that column. Is this possible?
The following data.table code worked for me.
dt[, average := rowMeans(.SD)]
As pointed out by @jangorecki, it is possible to construct your own function to run by row as long as you remember that each row is a list object:
# my function, must unlist the argument
myMean <- function(i, ...) mean(unlist(i), ...)
using by=seq_len
dt[, averageNew := myMean(.SD), by = seq_len(nrow(dt))]
using row.names
dt[, averageOther := myMean(.SD), by = row.names(dt)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With