I have a DataFrame like this:
df2 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
'value': ['a', 'b', 'a']})
date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a
I'm trying to understand how to apply a custom rolling function to it. I've tried doing this:
df2.rolling(2).apply(lambda x: 1)
But this gives me the original DataFrame back:
date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a
If I have a different DataFrame, like this:
df3 = pd.DataFrame({'a': [1, 2, 3], 'value': [4, 5, 6]})
The same rolling apply seems to work:
df3.rolling(2).apply(lambda x: 1)
a value
0 NaN NaN
1 1.0 1.0
2 1.0 1.0
Why is this not working for the first DataFrame?
Pandas version: 0.20.2
Python version: 2.7.10
Update
So, I've realized that df2
's columns are object-type, whereas the output of my lambda function is an integer. df3
's columns are both integer columns. I'm assuming that this is why the apply
isn't working.
The following doesn't work:
df2.rolling(2).apply(lambda x: 'a')
date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a
Furthermore, say I want to concatenate the characters in the value
column on a rolling basis, so that the output of the lambda function is a string, rather than an integer. The following also doesn't work:
df2.rolling(2).apply(lambda x: '.'.join(x))
date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a
What's going on here? Can rolling operations be applied to object-type columns in pandas?
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
A key takeaway about pandas performance is that doing operations per row in pandas dataframes is typically slow, but using columns as series to do vectorised operations on (taking a whole column at a time) is typically fast.
apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame. If you are working with tabular data, you must specify an axis you want your function to act on ( 0 for columns; and 1 for rows).
Numba can be used in 2 ways with pandas: Specify the engine="numba" keyword in select pandas methods. Define your own Python function decorated with @jit and pass the underlying NumPy array of Series or DataFrame (using to_numpy() ) into the function.
Here is one way this could be approached. Noting that rolling
is a wrapper for numpy
methods and the efficiency associated with those, this is not that. This merely provides a similiar api, to allow rolling on non-numeric columns:
import pandas as pd
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
def rolling_object(self, window, column, default):
return pd.concat(
[self[column].shift(i) for i in range(window)],
axis=1).fillna(default).T
This creates a custom dataframe class that has a rolling_object
method. It does not well match the pandas way in that it only operates on a single column at a time.
df2 = MyDataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
'value': ['a', 'b', 'c'],
'num': [1, 2, 3]
})
print(df2.rolling_object(2, 'value', '').apply(lambda x: '.'.join(x)))
0 a.
1 b.a
2 c.b
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With