Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Rolling slope calculation

How to calculate slope of each columns' rolling(window=60) value, stepped by 5?

I'd like to calculate every 5 minutes' value, and I don't need every record's results.

Here's sample dataframe and results:

df
Time                A    ...      N
2016-01-01 00:00  1.2    ...    4.2
2016-01-01 00:01  1.2    ...    4.0
2016-01-01 00:02  1.2    ...    4.5
2016-01-01 00:03  1.5    ...    4.2
2016-01-01 00:04  1.1    ...    4.6
2016-01-01 00:05  1.6    ...    4.1
2016-01-01 00:06  1.7    ...    4.3
2016-01-01 00:07  1.8    ...    4.5
2016-01-01 00:08  1.1    ...    4.1
2016-01-01 00:09  1.5    ...    4.1
2016-01-01 00:10  1.6    ...    4.1
....

result
Time                A    ...      N
2016-01-01 00:04  xxx    ...    xxx
2016-01-01 00:09  xxx    ...    xxx
2016-01-01 00:14  xxx    ...    xxx
...

Can df.rolling function be applied to this problem?

It's fine if NaN is in the window, meaning subset could be less than 60.

like image 926
Lcy Avatar asked Feb 09 '17 13:02

Lcy


2 Answers

try this

windows = df.groupby("Time")["A"].rolling(60)
df[out] = windows.apply(lambda x: np.polyfit(range(60), x, 1)[0], raw=True).values
like image 177
frquestions Avatar answered Nov 09 '22 01:11

frquestions


It seems that what you want is rolling with a specific step size. However, according to the documentation of pandas, step size is currently not supported in rolling.

If the data size is not too large, just perform rolling on all data and select the results using indexing.

Here's a sample dataset. For simplicity, the time column is represented using integers.

data = pd.DataFrame(np.random.rand(500, 1) * 10, columns=['a'])
            a
0    8.714074
1    0.985467
2    9.101299
3    4.598044
4    4.193559
..        ...
495  9.736984
496  2.447377
497  5.209420
498  2.698441
499  3.438271

Then, roll and calculate slopes,

def calc_slope(x):
    slope = np.polyfit(range(len(x)), x, 1)[0]
    return slope

# set min_periods=2 to allow subsets less than 60.
# use [4::5] to select the results you need.
result = data.rolling(60, min_periods=2).apply(calc_slope)[4::5]

The result will be,

            a
4   -0.542845
9    0.084953
14   0.155297
19  -0.048813
24  -0.011947
..        ...
479 -0.004792
484 -0.003714
489  0.022448
494  0.037301
499  0.027189

Or, you can refer to this post. The first answer provides a numpy way to achieve this: step size in pandas.DataFrame.rolling

like image 24
Cheng Avatar answered Nov 09 '22 00:11

Cheng