I have a datetime series of dtype: float64. I am trying to apply a custom function to a rolling window on the series. I want this function to return strings. However, this generates a TypeError. Why does this generate the error and is there a way to make this work directly with the application of one function?
Here is an example:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
new_series = number_series.rolling(5).apply(func)
The result is the following error:
TypeError: must be real number, not str
The workaround that I have in place at the moment is to amend the func to output integers to a series and then to apply another function to this series to generate the new series. As per the example below:
def func_float(s):
if s[-1] > s[-2] > s[-3]:
return 1
elif s[-1] > s[-2]:
return 2
else:
return 3
float_series = number_series.rolling(5).apply(func_float)
def func_text(s):
if s == 1:
return 'High'
elif s == 2:
return 'Medium'
else:
return 'Low'
new_series = float_series.apply(func_text)
This gives the expected result from the initial code that generated the error:
new_series
2000-01-02 Low
2000-01-09 Low
2000-01-16 Low
2000-01-23 Low
2000-01-30 Medium
...
2001-10-28 Low
2001-11-04 Medium
2001-11-11 High
2001-11-18 High
2001-11-25 Low
Length: 100, dtype: object
Note that the apply
function for a Rolling
object is different from the apply
function for a Series
object and I agree with you that this is a bit confusing. In my understanding, the functions applied to rolling windows are typically meant for aggregation of data (such as sum
, count
etc.).
However, you can convert your rolling windows to a list and apply the function to that list (thanks to this discussion).
So my approach would be:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if len(s) > 2:
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
else:
return ''
list = [func(window) for window in list(number_series.rolling(5))]
new_series = pd.Series(list, index=number_series.index)
Also note that func
needs to handle the first items differently because indices would otherwise be out of bounds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With