pandas rolling apply doesn't do anything

Tags:

pandas

I have a DataFrame like this:

df2 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
                    'value': ['a', 'b', 'a']})

         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

I'm trying to understand how to apply a custom rolling function to it. I've tried doing this:

df2.rolling(2).apply(lambda x: 1)

But this gives me the original DataFrame back:

         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

If I have a different DataFrame, like this:

df3 = pd.DataFrame({'a': [1, 2, 3], 'value': [4, 5, 6]})

The same rolling apply seems to work:

df3.rolling(2).apply(lambda x: 1)

     a  value
0  NaN    NaN
1  1.0    1.0
2  1.0    1.0

Why is this not working for the first DataFrame?

Pandas version: 0.20.2

Python version: 2.7.10

Update

So, I've realized that df2's columns are object-type, whereas the output of my lambda function is an integer. df3's columns are both integer columns. I'm assuming that this is why the apply isn't working.

The following doesn't work:

df2.rolling(2).apply(lambda x: 'a')
         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

Furthermore, say I want to concatenate the characters in the value column on a rolling basis, so that the output of the lambda function is a string, rather than an integer. The following also doesn't work:

df2.rolling(2).apply(lambda x: '.'.join(x))

         date value
0  2015-01-01     a
1  2015-01-02     b
2  2015-01-03     a

What's going on here? Can rolling operations be applied to object-type columns in pandas?

525

asked Jun 11 '17 00:06

LateCoder

1 Answers

Here is one way this could be approached. Noting that rolling is a wrapper for numpy methods and the efficiency associated with those, this is not that. This merely provides a similiar api, to allow rolling on non-numeric columns:

Code:

import pandas as pd

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    def rolling_object(self, window, column, default):
        return pd.concat(
            [self[column].shift(i) for i in range(window)],
            axis=1).fillna(default).T

This creates a custom dataframe class that has a rolling_object method. It does not well match the pandas way in that it only operates on a single column at a time.

Test Code:

df2 = MyDataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
                   'value': ['a', 'b', 'c'],
                   'num': [1, 2, 3]
                   })

print(df2.rolling_object(2, 'value', '').apply(lambda x: '.'.join(x)))

Results:

0     a.
1    b.a
2    c.b
dtype: object

127

answered Oct 22 '22 18:10

Stephen Rauch

Related questions
                            
                                Python os.fork OSError : [Errno 12] Cannot allocate memory (but memory not the issue)
                            
                                Does django-rest-swagger not work well with modelserializers?
                            
                                How to assign a value to a django form field in the template?
                            
                                django countries encoding is not giving correct name
                            
                                Flask-Login: Does not work on local machine but fine on hosting
                            
                                Handling HTTP authentication when accesing remote urls via pandas
                            
                                IPython autoreload changes in subdirectory
                            
                                Tkinter's overrideredirect prevents certain events in Mac and Linux
                            
                                How to embed Python3 with the standard library
                            
                                How can I filter a Pandas GroupBy object and obtain a GroupBy object back?
                            
                                "OverflowError: Allocated too many blocks":
                            
                                Authenticate in Django without a database
                            
                                Comparing logical values to NaN in pandas/numpy
                            
                                How to nest LabelKFold?
                            
                                Performance issues with pandas and filtering on datetime column
                            
                                Tensorflow: How to pass output from previous time-step as input to next timestep
                            
                                pyLDAvis visualization of pyspark generated LDA model
                            
                                OpenALPR not work with PyQt
                            
                                Python: docstrings and type annotations
                            
                                QM coding implementation in Python - is 16 bit word obligatory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With