Does anyone know an efficient function/method such as pandas.rolling_mean
, that would calculate the rolling difference of an array
This is my closest solution:
roll_diff = pd.Series(values).diff(periods=1)
However, it only calculates single-step rolling difference. Ideally the step size would be editable (i.e. difference between current time step and n last steps).
I've also written this, but for larger arrays, it is quite slow:
def roll_diff(values,step):
diff = []
for i in np.arange(step, len(values)-1):
pers_window = np.arange(i-1,i-step-1,-1)
diff.append(np.abs(values[i] - np.mean(values[pers_window])))
diff = np.pad(diff, (0, step+1), 'constant')
return diff
Pandas Series: rolling() function The rolling() function is used to provide rolling window calculations. Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.
Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
What about:
import pandas
x = pandas.DataFrame({
'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
index=[0, 1, 2, 3, 4, 5, 6, 7])
x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])
in general you can replace the lambda
function with your own function. Note that in this case the first item will be NaN
.
Defining the following:
n_steps = 2
def my_fun(x):
return x.iloc[-1] - x.iloc[0]
x['x_1'].rolling(window=n_steps).apply(my_fun)
you can compute the differences between values at n_steps
.
You can do the same thing as in https://stackoverflow.com/a/48345749/1011724 if you work directly on the underlying numpy array:
import numpy as np
diff_kernel = np.array([1,-1])
np.convolve(rs,diff_kernel ,'same')
where rs
is your pandas series
This should work:
import numpy as np
x = np.array([1, 3, 6, 1, -5, 6, 4, 1, 6])
def running_diff(arr, N):
return np.array([arr[i] - arr[i-N] for i in range(N, len(arr))])
running_diff(x, 4) # array([-6, 3, -2, 0, 11])
For a given pd.Series
, you will have to define what you want for the first few items. The below example just returns the initial series values.
s_roll_diff = np.hstack((s.values[:4], running_diff(s.values, 4)))
This works because you can assign a np.array
directly to a pd.DataFrame
, e.g. for a column s
, df.s_roll_diff = np.hstack((df.s.values[:4], running_diff(df.s.values, 4)))
If you got KeyError: 0
, try with iloc
:
import pandas
x = pandas.DataFrame({
'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
index=[0, 1, 2, 3, 4, 5, 6, 7])
x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With