Compute rolling z-score in pandas dataframe

Tags:

Is there a open source function to compute moving z-score like https://turi.com/products/create/docs/generated/graphlab.toolkits.anomaly_detection.moving_zscore.create.html. I have access to pandas rolling_std for computing std, but want to see if it can be extended to compute rolling z scores.

261

asked Nov 07 '17 18:11

user308827

1 Answers

rolling.apply with a custom function is significantly slower than using builtin rolling functions (such as mean and std). Therefore, compute the rolling z-score from the rolling mean and rolling std:

def zscore(x, window):
    r = x.rolling(window=window)
    m = r.mean().shift(1)
    s = r.std(ddof=0).shift(1)
    z = (x-m)/s
    return z

According to the definition given on this page the rolling z-score depends on the rolling mean and std just prior to the current point. The shift(1) is used above to achieve this effect.

Below, even for a small Series (of length 100), zscore is over 5x faster than using rolling.apply. Since rolling.apply(zscore_func) calls zscore_func once for each rolling window in essentially a Python loop, the advantage of using the Cythonized r.mean() and r.std() functions becomes even more apparent as the size of the loop increases. Thus, as the length of the Series increases, the speed advantage of zscore increases.

In [58]: %timeit zscore(x, N)
1000 loops, best of 3: 903 µs per loop

In [59]: %timeit zscore_using_apply(x, N)
100 loops, best of 3: 4.84 ms per loop

This is the setup used for the benchmark:

import numpy as np
import pandas as pd
np.random.seed(2017)

def zscore(x, window):
    r = x.rolling(window=window)
    m = r.mean().shift(1)
    s = r.std(ddof=0).shift(1)
    z = (x-m)/s
    return z


def zscore_using_apply(x, window):
    def zscore_func(x):
        return (x[-1] - x[:-1].mean())/x[:-1].std(ddof=0)
    return x.rolling(window=window+1).apply(zscore_func)

N = 5
x = pd.Series((np.random.random(100) - 0.5).cumsum())

result = zscore(x, N)
alt = zscore_using_apply(x, N)

assert not ((result - alt).abs() > 1e-8).any()

165

answered Sep 20 '22 14:09

unutbu

Related questions
                            
                                case_when function from R to Python
                            
                                Understanding and evaluating template matching methods
                            
                                Python: Why can't I iterate over a list? Is my exception class borked?
                            
                                Ordered Sets Python 2.7
                            
                                Compress whitespaces in string [duplicate]
                            
                                saving an 'lxml.etree._ElementTree' object
                            
                                Pyramid: Equivalent of MVC in PHP Frameworks in Pyramid / Python?
                            
                                how to split a string on the first instance of delimiter in python
                            
                                Can I raise a signal from python?
                            
                                strange UnicodeDecodeError on django
                            
                                Python - except (OSError, e) - No longer working in 3.3.3?
                            
                                win32com import error python 3.4 [duplicate]
                            
                                Swapping Columns with NumPy arrays
                            
                                Two Sum on LeetCode
                            
                                Python - printing out list separated with comma
                            
                                Sort elements with specific order in python
                            
                                How to raise a error inside form_valid method of a CreateView
                            
                                pandas dataframe return first word in string for column
                            
                                How to make zip_longest available in itertools using Python 2.7
                            
                                Reading python documentation in the terminal?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compute rolling z-score in pandas dataframe

Tags:

python

pandas

user308827

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us