Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating standard deviation of each row

Tags:

r

I am trying to use rowSds()to calculate each rows standard deviation so that I can pick the rows that have high sds to graph.

My data frame is called xx is like this:

head(xx,1)
     Job     variable 2012-02-23 2012-02-24 2012-02-25 2012-02-27 2012-02-28 2012-02-29 2012-03-01 2012-03-02 2012-03-03 2012-03-05 2012-03-06 2012-03-07 2012-03-08 2012-03-09 2012-03-10 2012-03-12 2012-03-13 2012-03-14
1 A Duration        152        424         NA        499        320        117        211        363         NA        605         76        309        204        185         NA         25        733        500
  2012-03-15 2012-03-16 2012-03-17 2012-03-19 2012-03-20 2012-03-21 2012-03-22 2012-03-23 2012-03-24 2012-03-26 2012-03-27 2012-03-28 2012-03-29 2012-03-30 2012-03-31 2012-04-02 2012-04-03 2012-04-04 2012-04-05 2012-04-06
1        521        601         NA        229        758        421        334        659         NA        419        423        444        289        594         NA        327        533        183        211        235
  2012-04-07 2012-04-09 2012-04-10 2012-04-11 2012-04-12 2012-04-13 2012-04-14 2012-04-16 2012-04-17 2012-04-18 2012-04-19 2012-04-20 2012-04-21 2012-04-23 2012-04-24 2012-04-25 2012-04-26 2012-04-27 2012-04-28 2012-04-30
1         NA        225        419        236        218        188         NA        205        547        153        196        200         NA        259        257        208        302        244         NA        806
  2012-05-01 2012-05-02 2012-05-03 2012-05-04 2012-05-05 2012-05-07 2012-05-08 2012-05-09 2012-05-10 2012-05-11 2012-05-12 2012-05-14 2012-05-15 2012-05-16 2012-05-17 2012-05-18 2012-05-19 2012-05-21 2012-05-22 2012-05-23
1        402        492       1078        440         NA        382        576       1105        511        368         NA        360        381       1152        718        353         NA        408        413        935
  2012-05-24 2012-05-25 2012-05-26 2012-05-28 2012-05-29 2012-05-30 2012-05-31 2012-06-01 2012-06-02 2012-06-04 2012-06-05 2012-06-06 2012-06-07 2012-06-08 2012-06-09 2012-06-11 2012-06-12 2012-06-13 2012-06-14 2012-06-15
1        306        277         NA        253        367        977        557        432         NA        328        521        467        972       1556         NA        386       1394        401        857        857
  2012-06-16 2012-06-18 2012-06-19 2012-06-20 2012-06-21 2012-06-22 2012-06-23 2012-06-25 2012-06-26 2012-06-27 2012-06-28 2012-06-29 2012-06-30 2012-07-02 2012-07-03 2012-07-04 2012-07-05 2012-07-06 2012-07-07 2012-07-09
1         NA       1056        324        329        327        325         NA        341        268        231        245        301         NA        283        365        297        310        260         NA        254
  2012-07-10 2012-07-11 2012-07-12 2012-07-13 2012-07-14 2012-07-16 2012-07-17 2012-07-18 2012-07-19 2012-07-20 2012-07-21 2012-07-23 2012-07-24 2012-07-25 2012-07-26 2012-07-27 2012-07-28 2012-07-30 2012-07-31 2012-08-01
1        283        395        273        273         NA        278        243        210        356        267         NA        442        483        271        327        271         NA        716        598        577
  2012-08-02 2012-08-03 2012-08-06 2012-08-07 2012-08-08 2012-08-09 2012-08-10 2012-08-13 2012-08-14 2012-08-15 2012-08-16 2012-08-17 2012-08-20 2012-08-21 2012-08-22 2012-08-23 2012-08-24 2012-08-27 2012-08-28 2012-08-29
1        345        403        318        522        333        259        404        244        240        288        245         22        738        530        390        648        294        403        381        724
  2012-08-30 2012-08-31 2012-09-03 2012-09-04 2012-09-05 2012-09-06 2012-09-07 2012-09-10 2012-09-11 2012-09-12 2012-09-13 2012-09-14 2012-09-17 2012-09-18 2012-09-19 2012-09-20 2012-09-21 2012-09-24 2012-09-25 2012-09-26
1        740        575        558        785        883        501        901        500        285        174        562       1047        603        990        289        173        253        512        236        278
  2012-09-27 2012-09-28 2012-10-01 2012-10-02 2012-10-03 2012-10-04 2012-10-05 2012-10-08 2012-10-09 2012-10-10 2012-10-11 1        173        277        217        291        197        308        124        387        369        250        242

I am trying to calculate each rows standard deviation and assinging to sd column name:

xx$sd<-rowSds(xx)

I get this error:

Error in apply(na.omit(as.matrix(x), ...), 1, FUN, ...) : 
  error in evaluating the argument 'X' in selecting a method for function 'apply': Error in na.omit(as.matrix(x), ...) : 
  error in evaluating the argument 'object' in selecting a method for function 'na.omit': Error in `colnames<-`(`*tmp*`, value = c("2012-02-23", "2012-02-24", "2012-02-25",  : 
  length of 'dimnames' [2] not equal to array extent

Any ideas how can I omit NA when calculating the SD? Is my syntax correct?

like image 628
user1471980 Avatar asked Oct 12 '12 14:10

user1471980


People also ask

How do you calculate SD in each group?

First, review how a SD of one group is computed: Calculate the difference between each value and the group mean, square those differences, add them up, and divide by the number of degrees of freedom (df), which equals n-1. That value is the variance. Its square root is the SD.

What is the formula for standard deviation?

It helps us to compare the sets of data that have the same mean but a different range. The sample standard deviation formula is: s=√1n−1∑ni=1(xi−¯x)2 s = 1 n − 1 ∑ i = 1 n ( x i − x ¯ ) 2 , where ¯x x ¯ is the sample mean and xi x i gives the data observations and n denotes the sample size.

How do I find the SD of an array?

To calculate the variance we use the map() method and mutate the array by assigning (value – mean) ^ 2 to every array item, and then we calculate the sum of the array, and then we divide the sum with the length of the array. To calculate the standard deviation we calculate the square root of the array.


1 Answers

You can use apply and transform functions

set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
transform(X, SD=apply(X,1, sd, na.rm = TRUE))
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10       SD
1  NA 12 17 18 19 16 12 13 20  14 3.041381
2  14 12 13 13 14 18 16 17 20  10 3.020302
3  11 19 NA 12 19 19 19 20 12  20 3.865805
4  10 11 20 12 15 17 18 17 18  12 3.496029
5  12 15 NA 14 20 18 16 11 14  18 2.958040
6  19 11 10 20 13 14 17 16 10  16 3.596294
7  14 16 17 15 10 11 15 15 11  16 2.449490
8  NA 10 15 19 19 12 15 15 19  14 3.201562
9  11 NA NA 20 20 14 14 17 14  19 3.356763
10 15 13 14 15 NA 13 15 NA 15  12 1.195229

From ?apply you can see ... which allows using optional arguments to FUN, in this case you can use na.rm=TRUE to omit NA values.

Using rowSds from matrixStats package also requires setting na.rm=TRUE to omit NA

library(matrixStats)
transform(X, SD=rowSds(X, na.rm=TRUE)) # same result as before.
like image 130
Jilber Urbina Avatar answered Oct 22 '22 16:10

Jilber Urbina