Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling difference in Pandas

Tags:

python

pandas

Does anyone know an efficient function/method such as pandas.rolling_mean, that would calculate the rolling difference of an array

This is my closest solution:

roll_diff = pd.Series(values).diff(periods=1)

However, it only calculates single-step rolling difference. Ideally the step size would be editable (i.e. difference between current time step and n last steps).

I've also written this, but for larger arrays, it is quite slow:

def roll_diff(values,step):
    diff = []
    for i in np.arange(step, len(values)-1):
        pers_window = np.arange(i-1,i-step-1,-1)
        diff.append(np.abs(values[i] - np.mean(values[pers_window])))
    diff = np.pad(diff, (0, step+1), 'constant')
    return diff
like image 748
William Baker Morrison Avatar asked Jan 30 '18 09:01

William Baker Morrison


People also ask

What does rolling do in pandas?

Pandas Series: rolling() function The rolling() function is used to provide rolling window calculations. Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

How do you get the difference in rows in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

What does diff () do in pandas?

Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.


4 Answers

What about:

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])

in general you can replace the lambda function with your own function. Note that in this case the first item will be NaN.

Update

Defining the following:

n_steps = 2
def my_fun(x):
    return x.iloc[-1] - x.iloc[0]

x['x_1'].rolling(window=n_steps).apply(my_fun)

you can compute the differences between values at n_steps.

like image 181
Pierluigi Avatar answered Nov 12 '22 23:11

Pierluigi


You can do the same thing as in https://stackoverflow.com/a/48345749/1011724 if you work directly on the underlying numpy array:

import numpy as np
diff_kernel = np.array([1,-1])
np.convolve(rs,diff_kernel ,'same')

where rs is your pandas series

like image 27
Dan Avatar answered Nov 13 '22 00:11

Dan


This should work:

import numpy as np

x = np.array([1, 3, 6, 1, -5, 6, 4, 1, 6])

def running_diff(arr, N):
    return np.array([arr[i] - arr[i-N] for i in range(N, len(arr))])

running_diff(x, 4)  # array([-6,  3, -2,  0, 11])

For a given pd.Series, you will have to define what you want for the first few items. The below example just returns the initial series values.

s_roll_diff = np.hstack((s.values[:4], running_diff(s.values, 4)))

This works because you can assign a np.array directly to a pd.DataFrame, e.g. for a column s, df.s_roll_diff = np.hstack((df.s.values[:4], running_diff(df.s.values, 4)))

like image 28
jpp Avatar answered Nov 12 '22 23:11

jpp


If you got KeyError: 0, try with iloc:

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])
like image 3
Manualmsdos Avatar answered Nov 13 '22 01:11

Manualmsdos