How to create mean and s.d. columns in data.table

Question

The following code/outcome baffles me as to why data.table returns NA for the mean functions and not the sd function.

library(data.table)
test <- data.frame('id'=c(1,2,3,4,5),
                   'A'=seq(2,9,length=5),
                   'B'=seq(3,9,length=5),
                   'C'=seq(4,9,length=5),
                   'D'=seq(5,9,length=5))

test <- as.data.table(test)

test[,`:=`(mean_test = mean(.SD), sd_test = sd(.SD)),by=id,.SDcols=c('A','B','C','D')]
> test
   id    A   B    C    D mean_test   sd_test
   1:  1 2.00 3.0 4.00 5        NA 1.2909944
   2:  2 3.75 4.5 5.25 6        NA 0.9682458
   3:  3 5.50 6.0 6.50 7        NA 0.6454972
   4:  4 7.25 7.5 7.75 8        NA 0.3227486
   5:  5 9.00 9.0 9.00 9        NA 0.0000000

I've learned quite a bit searching around, going through the DT tutorials/examples. This question is very similar to what I was hoping to do.

Why does the standard deviation function work and the mean function return NA?

Edit: Using Ricardo Saporta's solution:

test[,`:=`(mean_test = apply(.SD, 1, mean), sd_test = apply(.SD, 1, sd),by=id,.SDcols=c('A','B','C','D')]

> test
   id    A   B    C D mean_test   sd_test
1:  1 2.00 3.0 4.00 5     3.500 1.2909944
2:  2 3.75 4.5 5.25 6     4.875 0.9682458
3:  3 5.50 6.0 6.50 7     6.250 0.6454972
4:  4 7.25 7.5 7.75 8     7.625 0.3227486
5:  5 9.00 9.0 9.00 9     9.000 0.0000000

Ricardo Saporta · Accepted Answer

.SD is itself a data.table
Thus, when you take mean(.SD) you are (attempting) to take the mean of an entire data.table

The function mean() does not know what to do with the data.table and returns NA

Have a look

## the .SD in your question is the same as 
test[, c('A','B','C','D')]

## try taking its mean
mean(test[, c('A','B','C','D')])

# Warning in mean.default(test[, c("A", "B", "C", "D")]) :
#   argument is not numeric or logical: returning NA
# [1] NA

try this instead

use lapply(.SD, mean) for column-wise or apply(.SD, 1, mean) for row-wise

David Arenburg · Answer

You can make mean work by using rowMeans instead, and thus avoid using apply (similar to the linked question)

test[,`:=`(mean_test = rowMeans(.SD), 
           sd_test = sd(.SD)),
     by=id,.SDcols=c('A','B','C','D')]
test
#    id    A   B    C D mean_test   sd_test
# 1:  1 2.00 3.0 4.00 5     3.500 1.2909944
# 2:  2 3.75 4.5 5.25 6     4.875 0.9682458
# 3:  3 5.50 6.0 6.50 7     6.250 0.6454972
# 4:  4 7.25 7.5 7.75 8     7.625 0.3227486
# 5:  5 9.00 9.0 9.00 9     9.000 0.0000000

Valentin · Answer

Rather as a fun fact, one can use a vector of columns in mean() and sd():

test[, `:=` (mean = mean(c(A,B,C,D)),
             sd   = sd(c(A,B,C,D))),  by=id]
test
#    id    A   B    C D   mean        sd
# 1:  1 2.00 3.0 4.00 5  3.500 1.2909944
# 2:  2 3.75 4.5 5.25 6  4.875 0.9682458
# 3:  3 5.50 6.0 6.50 7  6.250 0.6454972
# 4:  4 7.25 7.5 7.75 8  7.625 0.3227486
# 5:  5 9.00 9.0 9.00 9  9.000 0.0000000

And you can also use quote() and eval():

cols <- quote(c(A,B,C,D))
test[, ':=' (mean = mean(eval(cols)), 
             sd  = sd(eval(cols))),  by=id]

How to create mean and s.d. columns in data.table

Tags:

r

data.table

mean

calculated-columns

nfmcclure

3 Answers

Have a look

try this instead

Ricardo Saporta

David Arenburg

Valentin

Recent Activity

Donate For Us

How to create mean and s.d. columns in data.table

Tags:

r

data.table

mean

calculated-columns

nfmcclure

3 Answers

Have a look

try this instead

Ricardo Saporta

David Arenburg

Valentin

Related questions

Recent Activity

Donate For Us