Rolling and cumulative standard deviation in a Python dataframe

Question

Is there a vectorized operation to calculate the cumulative and rolling standard deviation (SD) of a Python DataFrame?

For example, I want to add a column 'c' which calculates the cumulative SD based on column 'a', i.e. in index 0, it shows NaN due to 1 data point, and in index 1, it calculates SD based on 2 data points, and so on.

The same question goes to rolling SD too. Is there an efficient way to calculate without iterating through df.itertuples()?

import numpy as np
import pandas as pd

def main():
    np.random.seed(123)
    df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'])
    print(df)

if __name__ == '__main__':
    main()

Scott Boston · Accepted Answer

For cumulative SD base on columna 'a', let's use rolling with a windows size the length of the dataframe and min_periods = 2:

df['a'].rolling(len(df),min_periods=2).std()

Output:

          a         b         c
0 -1.085631  0.997345       NaN
1  0.282978 -1.506295  0.967753
2 -0.578600  1.651437  0.691916
3 -2.426679 -0.428913  1.133892
4  1.265936 -0.866740  1.395750
5 -0.678886 -0.094709  1.250335
6  1.491390 -0.638902  1.374933
7 -0.443982 -0.434351  1.274843
8  2.205930  2.186786  1.450563
9  1.004054  0.386186  1.403721

And for rolling SD based on two values at a time:

df['c'] = df['a'].rolling(2).std()

Output:

          a         b         c
0 -1.085631  0.997345       NaN
1  0.282978 -1.506295  0.967753
2 -0.578600  1.651437  0.609228
3 -2.426679 -0.428913  1.306789
4  1.265936 -0.866740  2.611073
5 -0.678886 -0.094709  1.375197
6  1.491390 -0.638902  1.534617
7 -0.443982 -0.434351  1.368514
8  2.205930  2.186786  1.873771
9  1.004054  0.386186  0.849855

Tomasz Gandor · Answer

I think, if by rolling you mean cumulative, then the right term in Pandas is expanding:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.expanding.html#pandas.DataFrame.expanding

It also accepts a min_periods argument.

df['c'] = df['a'].expanding(2).std()

The case for rolling was handled by Scott Boston, and it is unsurprisingly called rolling in Pandas.

The advantage if expanding over rolling(len(df), ...) is, you don't need to know the len in advance. It is very useful e.g. in groupby dataframes.

Rolling and cumulative standard deviation in a Python dataframe

Tags:

python

pandas

dataframe

standard-deviation

Roy

2 Answers

Scott Boston

Tomasz Gandor

Recent Activity

Donate For Us

Rolling and cumulative standard deviation in a Python dataframe

Tags:

python

pandas

dataframe

standard-deviation

Roy

2 Answers

Scott Boston

Tomasz Gandor

Related questions

Recent Activity

Donate For Us