Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas rolling apply doesn't do anything

Tags:

python

pandas

I have a DataFrame like this:

df2 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
                    'value': ['a', 'b', 'a']})

         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

I'm trying to understand how to apply a custom rolling function to it. I've tried doing this:

df2.rolling(2).apply(lambda x: 1)

But this gives me the original DataFrame back:

         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

If I have a different DataFrame, like this:

df3 = pd.DataFrame({'a': [1, 2, 3], 'value': [4, 5, 6]})

The same rolling apply seems to work:

df3.rolling(2).apply(lambda x: 1)

     a  value
0  NaN    NaN
1  1.0    1.0
2  1.0    1.0

Why is this not working for the first DataFrame?

Pandas version: 0.20.2

Python version: 2.7.10

Update

So, I've realized that df2's columns are object-type, whereas the output of my lambda function is an integer. df3's columns are both integer columns. I'm assuming that this is why the apply isn't working.

The following doesn't work:

df2.rolling(2).apply(lambda x: 'a')
         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

Furthermore, say I want to concatenate the characters in the value column on a rolling basis, so that the output of the lambda function is a string, rather than an integer. The following also doesn't work:

df2.rolling(2).apply(lambda x: '.'.join(x))

         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

What's going on here? Can rolling operations be applied to object-type columns in pandas?

like image 525
LateCoder Avatar asked Jun 11 '17 00:06

LateCoder


People also ask

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

Is pandas apply slow?

A key takeaway about pandas performance is that doing operations per row in pandas dataframes is typically slow, but using columns as series to do vectorised operations on (taking a whole column at a time) is typically fast.

What does .apply do in Python?

apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame. If you are working with tabular data, you must specify an axis you want your function to act on ( 0 for columns; and 1 for rows).

Does pandas work with Numba?

Numba can be used in 2 ways with pandas: Specify the engine="numba" keyword in select pandas methods. Define your own Python function decorated with @jit and pass the underlying NumPy array of Series or DataFrame (using to_numpy() ) into the function.


1 Answers

Here is one way this could be approached. Noting that rolling is a wrapper for numpy methods and the efficiency associated with those, this is not that. This merely provides a similiar api, to allow rolling on non-numeric columns:

Code:

import pandas as pd

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    def rolling_object(self, window, column, default):
        return pd.concat(
            [self[column].shift(i) for i in range(window)],
            axis=1).fillna(default).T

This creates a custom dataframe class that has a rolling_object method. It does not well match the pandas way in that it only operates on a single column at a time.

Test Code:

df2 = MyDataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
                   'value': ['a', 'b', 'c'],
                   'num': [1, 2, 3]
                   })

print(df2.rolling_object(2, 'value', '').apply(lambda x: '.'.join(x)))

Results:

0     a.
1    b.a
2    c.b
dtype: object
like image 127
Stephen Rauch Avatar answered Oct 22 '22 18:10

Stephen Rauch