Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't my pandas rolling().apply() work when the series contains collections?

Tags:

python

pandas

I've got a pandas series in which each cell is a tuple. I'm trying to do a rolling().apply() on that series, and the function I'm trying to apply is never getting called. Here's a silly example that shows what I'm talking about:

>>> import pandas as pd
>>> pd.__version__
u'0.18.0'
>>> die = lambda x: 0/0

>>> s = pd.Series(zip(range(5), range(5)))
>>> s
0    (0, 0)
1    (1, 1)
2    (2, 2)
3    (3, 3)
4    (4, 4)
dtype: object

A simple apply works as expected, in that the function is called:

>>> s.apply(die)
[...]
ZeroDivisionError: integer division or modulo by zero

But but a rolling().apply() does nothing at all, and in particular the function that is supposed to be applied never gets called:

>>> s.rolling(2).apply(die)
0    (0, 0)
1    (1, 1)
2    (2, 2)
3    (3, 3)
4    (4, 4)
dtype: object

This is the simplest example that demonstrates what I'm talking about, but the same thing happens with sets & lists.

Why does this happen, and how can I do a rolling apply with a custom function on a series of collections?

like image 250
David Avatar asked Apr 19 '16 15:04

David


People also ask

How do you use DataFrame rolling?

For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame's index. Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window. If 0 or 'index' , roll across the rows.

Can pandas series hold different data types?

In the same way you can't attach a specific data type to list , even if all elements are of the same type, a Pandas object series contains pointers to any number of types.

How do I access pandas Series objects?

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

Does DataFrame apply function call twice on the first column row to decide whether it can take a fast or slow code path?

applied : Series or DataFrame. In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.


1 Answers

This will not work because the pandas.DataFrame.rolling function returns a Window or Rolling sub-classed for the particular operation while pandas.DataFrame.apply Applies function along input axis of DataFrame. As mentioned by ayhan, in this post.

like image 198
2Obe Avatar answered Sep 21 '22 05:09

2Obe