Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate combinations of values from rolling window in Pandas

For every row in my dataframe, I need to create every combination of two values of column a from a three-day sliding window ending at that row. My dataframe is like this:

import pandas as pd    
df = pd.DataFrame({'a': [1, 2, 3, 4, 5]},
                   index=[pd.Timestamp('20180101'),
                          pd.Timestamp('20180102'),
                          pd.Timestamp('20180103'),
                          pd.Timestamp('20180105'),
                          pd.Timestamp('20180106')])

Note that the time index is ragged (inconsistent intervals between rows). The combinations should come out to be:

row0: None
row1: [(1, 2)]
row2: [(1, 2), (1, 3), (2, 3)]
row4: [(3, 4)]
row5: [(4, 5)]

I can do this easily enough without the window, just use itertools.combinations to generate every combination of two elements of column a with:

import itertools as it
combos = it.combinations(df['a'], 2)
for c in combos:
    print(c)
# (1, 2)
# (1, 3)
# (1, 4)
# (1, 5)
# etc.

but I need the windowed version for my application. My best bet so far is to use df.rolling. I can do simple things like summing the elements over a three day window with something like:

df.rolling('3d').sum()
# get [1, 3, 6, 7, 9] which we expect

but I can't seem to perform more complicated operations (or return more complicated types than real numbers from an operation) on the rolling window.


Question

How do I use df.rolling to make combinations over my rolling window? Or is there some other tool to do this?


Attempts

My thought so far is that there is some way to use df.rolling and df.apply along with it.combinations to generate iterators for each window in my dataframe, and then plug that iterator into a new column of my dataframe. Something like:

df.rolling('3d').apply(lambda x: it.combinations(x, 2))

which gives a TypeError:

TypeError: must be real number, not itertools.combinations

because df.rolling.apply requires that its argument return a single real value, not an object, nor a list.

I also tried using it.combinations directly on the rolling window:

it.combinations(df.rolling('3d'), 2)

which gives:

KeyError: 'Column not found: 0'

and if I select column a explicitly:

it.combinations(df.rolling('3d')['a'], 2)

I get:

Exception: Column(s) a already selected

So is there maybe a way to define a function that I can call with df.apply that plugs the iterator over my rolling window into a new column for each row of my dataframe? Can I even operate on rows other than the current row in a function passed to apply?

like image 318
Engineero Avatar asked Jun 12 '18 19:06

Engineero


1 Answers

Okay, this is a hack, but it might be useful.

All we want to do is reuse df.rolling's windowing facilities. We could try to look inside some non-public parts of the code, but instead let's just take advantage of the fact we can force a function call inside apply before we return a float:

In [28]: dummy = df.rolling("3d")["a"].apply((lambda x: print(x) or 0), raw=False)
2018-01-01    1.0
dtype: float64
2018-01-01    1.0
2018-01-02    2.0
dtype: float64
2018-01-01    1.0
2018-01-02    2.0
2018-01-03    3.0
dtype: float64
2018-01-03    3.0
2018-01-05    4.0
dtype: float64
2018-01-05    4.0
2018-01-06    5.0
dtype: float64

And so:

In [29]: roll_slices = []

In [30]: dummy = df.rolling("3d")["a"].apply((lambda x: roll_slices.append(list(combinations(x, 2))) or 0), raw=False)

In [31]: roll_slices
Out[31]: 
[[],
 [(1.0, 2.0)],
 [(1.0, 2.0), (1.0, 3.0), (2.0, 3.0)],
 [(3.0, 4.0)],
 [(4.0, 5.0)]]

After which you can do what you like.

like image 82
DSM Avatar answered Oct 05 '22 19:10

DSM