Is there a vectorized operation to calculate the cumulative and rolling standard deviation (SD) of a Python DataFrame?
For example, I want to add a column 'c' which calculates the cumulative SD based on column 'a', i.e. in index 0, it shows NaN due to 1 data point, and in index 1, it calculates SD based on 2 data points, and so on.
The same question goes to rolling SD too. Is there an efficient way to calculate without iterating through df.itertuples()?
import numpy as np
import pandas as pd
def main():
    np.random.seed(123)
    df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'])
    print(df)
if __name__ == '__main__':
    main()
                For cumulative SD base on columna 'a', let's use rolling with a windows size the length of the dataframe and min_periods = 2:
df['a'].rolling(len(df),min_periods=2).std()
Output:
          a         b         c
0 -1.085631  0.997345       NaN
1  0.282978 -1.506295  0.967753
2 -0.578600  1.651437  0.691916
3 -2.426679 -0.428913  1.133892
4  1.265936 -0.866740  1.395750
5 -0.678886 -0.094709  1.250335
6  1.491390 -0.638902  1.374933
7 -0.443982 -0.434351  1.274843
8  2.205930  2.186786  1.450563
9  1.004054  0.386186  1.403721
And for rolling SD based on two values at a time:
df['c'] = df['a'].rolling(2).std()
Output:
          a         b         c
0 -1.085631  0.997345       NaN
1  0.282978 -1.506295  0.967753
2 -0.578600  1.651437  0.609228
3 -2.426679 -0.428913  1.306789
4  1.265936 -0.866740  2.611073
5 -0.678886 -0.094709  1.375197
6  1.491390 -0.638902  1.534617
7 -0.443982 -0.434351  1.368514
8  2.205930  2.186786  1.873771
9  1.004054  0.386186  0.849855
                        I think, if by rolling you mean cumulative, then the right term in Pandas is expanding: 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.expanding.html#pandas.DataFrame.expanding
It also accepts a min_periods argument.
df['c'] = df['a'].expanding(2).std()
The case for rolling was handled by Scott Boston, and it is unsurprisingly called rolling in Pandas.
The advantage if expanding over rolling(len(df), ...) is, you don't need to know the len in advance. It is very useful e.g. in groupby dataframes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With