Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use by = each row for data table

Tags:

r

data.table

I have a data table and I am trying to create a new variable that is a function of all the other columns. A simplified example would be if I simply wanted to sum or take an average across all the rows. For example:

dt <- data.table(a = 1:9, b = seq(10,90,10), c = seq(11:19), d = seq(100, 900, 100))

I want to create a vector/column that is simply the average of all the columns. The syntax that I think of would look something like this:

dt[, average := mean(.SD)]

However, this sums the whole thing. I know I can also do:

dt[, average := lapply(.SD, mean)] 

But this gives a single row result. I'm essentially looking for the equivalent of:

dt[, average := lapply(.SD, mean), by = all]

such that it simply calculates this for all the rows, without having to create an "id" column and doing all of my calculating by that column. Is this possible?

like image 492
Brandon Avatar asked Apr 22 '16 20:04

Brandon


1 Answers

The following data.table code worked for me.

 dt[, average := rowMeans(.SD)]

As pointed out by @jangorecki, it is possible to construct your own function to run by row as long as you remember that each row is a list object:

# my function, must unlist the argument
myMean <- function(i, ...) mean(unlist(i), ...)

using by=seq_len

dt[, averageNew := myMean(.SD), by = seq_len(nrow(dt))]

using row.names

dt[, averageOther := myMean(.SD), by = row.names(dt)]
like image 149
lmo Avatar answered Oct 17 '22 07:10

lmo