Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply a function not returning a numeric value to a pandas rolling Window?

I have a datetime series of dtype: float64. I am trying to apply a custom function to a rolling window on the series. I want this function to return strings. However, this generates a TypeError. Why does this generate the error and is there a way to make this work directly with the application of one function?

Here is an example:

import numpy as np
import pandas as pd

np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))

def func(s):
    
    if s[-1] > s[-2] > s[-3]:
        return 'High'
    elif s[-1] > s[-2]:
        return 'Medium'
    else:
        return 'Low'

new_series = number_series.rolling(5).apply(func)

The result is the following error:

TypeError: must be real number, not str

The workaround that I have in place at the moment is to amend the func to output integers to a series and then to apply another function to this series to generate the new series. As per the example below:

def func_float(s):
    
    if s[-1] > s[-2] > s[-3]:
        return 1
    elif s[-1] > s[-2]:
        return 2
    else:
        return 3
    
float_series = number_series.rolling(5).apply(func_float)

def func_text(s):

    if s == 1:
        return 'High'
    elif s == 2:
        return 'Medium'
    else:
        return 'Low'
    
new_series = float_series.apply(func_text)

This gives the expected result from the initial code that generated the error:

new_series

2000-01-02       Low
2000-01-09       Low
2000-01-16       Low
2000-01-23       Low
2000-01-30    Medium
               ...  
2001-10-28       Low
2001-11-04    Medium
2001-11-11      High
2001-11-18      High
2001-11-25       Low
Length: 100, dtype: object
like image 380
agftrading Avatar asked Feb 24 '21 00:02

agftrading


1 Answers

Note that the apply function for a Rolling object is different from the apply function for a Series object and I agree with you that this is a bit confusing. In my understanding, the functions applied to rolling windows are typically meant for aggregation of data (such as sum, count etc.).

However, you can convert your rolling windows to a list and apply the function to that list (thanks to this discussion).

So my approach would be:

import numpy as np
import pandas as pd

np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))

def func(s):
    if len(s) > 2:
        if s[-1] > s[-2] > s[-3]:
            return 'High'
        elif s[-1] > s[-2]:
            return 'Medium'
        else:
            return 'Low'
    else:
        return ''

list = [func(window) for window in list(number_series.rolling(5))]
new_series = pd.Series(list, index=number_series.index)

Also note that func needs to handle the first items differently because indices would otherwise be out of bounds.

like image 93
Gerd Avatar answered Sep 22 '22 13:09

Gerd