Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the standard deviation of subsets of data down a column in R

I would like to calculate the standard deviation of every 4 values down a column from the first to the last observation. I have found lots of answers for moving SD functions, but I simply need a line of code that will calculate the sd() for every 4 data values and write the answers into a new column in the data frame as below:

Example data:

Obs Count
1   56
2   29
3   66
4   62
5   49
6   12
7   65
8   81
9   73
10  66
11  71
12  59

Desired output:

Obs Count SD
1   56    16.68
2   29    16.68
3   66    16.68
4   62    16.68
5   49    29.55
6   12    29.55
7   65    29.55
8   81    29.55
9   73    6.24
10  66    6.24
11  71    6.24
12  59    6.24

I tried the below code, but this is obviously incorrect:

a <- for(i in 1: length(df)) sd(df$Count[i:(i+3)])

This should be a very easy task, but I have not been able to find an answer. I am still learning and any help would be appreciated.

like image 410
Emily Avatar asked Dec 11 '22 10:12

Emily


2 Answers

In base R, you can use the following to create an index of "every 4 rows":

(seq_len(nrow(mydf))-1) %/% 4
# [1] 0 0 0 0 1 1 1 1 2 2 2 2

Using that, you can use ave to get the desired result:

mydf$SD <- ave(mydf$Count, (seq_len(nrow(mydf))-1) %/% 4, FUN = sd)
mydf
#    Obs Count        SD
# 1    1    56 16.680827
# 2    2    29 16.680827
# 3    3    66 16.680827
# 4    4    62 16.680827
# 5    5    49 29.545163
# 6    6    12 29.545163
# 7    7    65 29.545163
# 8    8    81 29.545163
# 9    9    73  6.238322
# 10  10    66  6.238322
# 11  11    71  6.238322
# 12  12    59  6.238322
like image 149
A5C1D2H2I1M1N2O1R2T1 Avatar answered Dec 13 '22 22:12

A5C1D2H2I1M1N2O1R2T1


An anternative is using rollapply from zoo package in combination with rep.

> library(zoo)
> N <- 4 # every four values
> SDs <- rollapply(df[,2], width=N, by=N, sd)
> df$SD <- rep(SDs, each=N)
> df
   Obs Count        SD
1    1    56 16.680827
2    2    29 16.680827
3    3    66 16.680827
4    4    62 16.680827
5    5    49 29.545163
6    6    12 29.545163
7    7    65 29.545163
8    8    81 29.545163
9    9    73  6.238322
10  10    66  6.238322
11  11    71  6.238322
12  12    59  6.238322

You might want to get it all in a once:

df$SD <- rep( rollapply(df[,2], width=N, by=N, sd), each=N)
like image 28
Jilber Urbina Avatar answered Dec 13 '22 22:12

Jilber Urbina