Rolling difference in Pandas

Tags:

python

pandas

Does anyone know an efficient function/method such as pandas.rolling_mean, that would calculate the rolling difference of an array

This is my closest solution:

roll_diff = pd.Series(values).diff(periods=1)

However, it only calculates single-step rolling difference. Ideally the step size would be editable (i.e. difference between current time step and n last steps).

I've also written this, but for larger arrays, it is quite slow:

def roll_diff(values,step):
    diff = []
    for i in np.arange(step, len(values)-1):
        pers_window = np.arange(i-1,i-step-1,-1)
        diff.append(np.abs(values[i] - np.mean(values[pers_window])))
    diff = np.pad(diff, (0, step+1), 'constant')
    return diff

748

asked Jan 30 '18 09:01

William Baker Morrison

4 Answers

What about:

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])

in general you can replace the lambda function with your own function. Note that in this case the first item will be NaN.

Update

Defining the following:

n_steps = 2
def my_fun(x):
    return x.iloc[-1] - x.iloc[0]

x['x_1'].rolling(window=n_steps).apply(my_fun)

you can compute the differences between values at n_steps.

181

answered Nov 12 '22 23:11

Pierluigi

You can do the same thing as in https://stackoverflow.com/a/48345749/1011724 if you work directly on the underlying numpy array:

import numpy as np
diff_kernel = np.array([1,-1])
np.convolve(rs,diff_kernel ,'same')

where rs is your pandas series

answered Nov 13 '22 00:11

Dan

This should work:

import numpy as np

x = np.array([1, 3, 6, 1, -5, 6, 4, 1, 6])

def running_diff(arr, N):
    return np.array([arr[i] - arr[i-N] for i in range(N, len(arr))])

running_diff(x, 4)  # array([-6,  3, -2,  0, 11])

For a given pd.Series, you will have to define what you want for the first few items. The below example just returns the initial series values.

s_roll_diff = np.hstack((s.values[:4], running_diff(s.values, 4)))

This works because you can assign a np.array directly to a pd.DataFrame, e.g. for a column s, df.s_roll_diff = np.hstack((df.s.values[:4], running_diff(df.s.values, 4)))

answered Nov 12 '22 23:11

jpp

If you got KeyError: 0, try with iloc:

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])

answered Nov 13 '22 01:11

Manualmsdos

Related questions
                            
                                Should I force Python type checking?
                            
                                Python: re.compile and re.sub
                            
                                What happens when a function returns its own name in python?
                            
                                Seaborn implot with equation and R2 text
                            
                                Can't connect to S3 buckets with periods in their name, when using Boto on Heroku
                            
                                Matplotlib box plot fliers not showing
                            
                                Collapse multiple submodules to one Cython extension
                            
                                ImportError: No module named cryptography.hazmat.backends - boxsdk on Mac
                            
                                How to get Top 3 or Top N predictions using sklearn's SGDClassifier
                            
                                ValueError: malformed string using ast.literal_eval
                            
                                format r(repr) of print in python3
                            
                                How to convert generator object into list? [duplicate]
                            
                                Order in legend plots python
                            
                                pandas: drop duplicates in groupby 'date'
                            
                                How to detect merged cells in an Excel sheet?
                            
                                NLTK: corpus-level bleu vs sentence-level BLEU score
                            
                                replace empty list with NaN in pandas dataframe
                            
                                Black and white boxplots in Seaborn
                            
                                Scrolling to top of the page in Python using Selenium
                            
                                Tkinter error: Couldn't recognize data in image file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With