Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating standard deviation across rows

Tags:

r

dplyr

Say I have the following data:

colA <- c("SampA", "SampB", "SampC")
colB <- c(21, 20, 30)
colC <- c(15, 14, 12)
colD <- c(10, 22, 18)
df <- data.frame(colA, colB, colC, colD)
df
#    colA colB colC colD
# 1 SampA   21   15   10
# 2 SampB   20   14   22
# 3 SampC   30   12   18

I want to get the row means and standard deviations for the values in columns B-D.

I can calculate the rowMeans as follows:

library(dplyr)
df %>% select(., matches("colB|colC|colD")) %>% mutate(rmeans = rowMeans(.))
#   colB colC colD   rmeans
# 1   21   15   10 15.33333
# 2   20   14   22 18.66667
# 3   30   12   18 20.00000

But when I try to calculate the standard deviation using sd(), it throws up an error.

df %>% select(., matches("colB|colC|colD")) %>% mutate(rsds = sapply(., sd(.)))
Error in is.data.frame(x) : 
  (list) object cannot be coerced to type 'double'

So my question is: how do I calculate the standard deviations here?

Edit: I tried sapply() with sd() having read the first answer here.

Additional edit: not necessarily looking for a 'tidy' solution (base R also works just fine).

like image 205
Dunois Avatar asked Mar 24 '19 18:03

Dunois


2 Answers

I'm not sure how old/new dplyr's c_across functionality is relative to the prior answers on this page, but here's a solution that is almost directly cut and pasted from the documentation for dplyr::c_across:

df %>% 
  rowwise() %>% 
  mutate(
     mean = mean(c_across(colB:colD)),
     sd = sd(c_across(colB:colD))
  )

# A tibble: 3 x 6
# Rowwise: 
  colA   colB  colC  colD  mean    sd
  <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SampA    21    15    10  15.3  5.51
2 SampB    20    14    22  18.7  4.16
3 SampC    30    12    18  20    9.17
like image 168
D. Woods Avatar answered Sep 24 '22 03:09

D. Woods


Try this (using), withrowSds from the matrixStats package,

library(dplyr)
library(matrixStats)

columns <- c('colB', 'colC', 'colD')

df %>% 
  mutate(Mean= rowMeans(.[columns]), stdev=rowSds(as.matrix(.[columns])))

Returns

   colA colB colC colD     Mean    stdev
1 SampA   21   15   10 15.33333 5.507571
2 SampB   20   14   22 18.66667 4.163332
3 SampC   30   12   18 20.00000 9.165151

Your data

colA <- c("SampA", "SampB", "SampC")
colB <- c(21, 20, 30)
colC <- c(15, 14, 12)
colD <- c(10, 22, 18)
df <- data.frame(colA, colB, colC, colD)
df
like image 27
Hector Haffenden Avatar answered Sep 24 '22 03:09

Hector Haffenden