For every row in my dataframe, I need to create every combination of two values of column a
from a three-day sliding window ending at that row. My dataframe is like this:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 5]},
index=[pd.Timestamp('20180101'),
pd.Timestamp('20180102'),
pd.Timestamp('20180103'),
pd.Timestamp('20180105'),
pd.Timestamp('20180106')])
Note that the time index is ragged (inconsistent intervals between rows). The combinations should come out to be:
row0: None
row1: [(1, 2)]
row2: [(1, 2), (1, 3), (2, 3)]
row4: [(3, 4)]
row5: [(4, 5)]
I can do this easily enough without the window, just use itertools.combinations
to generate every combination of two elements of column a
with:
import itertools as it
combos = it.combinations(df['a'], 2)
for c in combos:
print(c)
# (1, 2)
# (1, 3)
# (1, 4)
# (1, 5)
# etc.
but I need the windowed version for my application. My best bet so far is to use df.rolling
. I can do simple things like summing the elements over a three day window with something like:
df.rolling('3d').sum()
# get [1, 3, 6, 7, 9] which we expect
but I can't seem to perform more complicated operations (or return more complicated types than real numbers from an operation) on the rolling window.
How do I use df.rolling
to make combinations over my rolling window? Or is there some other tool to do this?
My thought so far is that there is some way to use df.rolling
and df.apply
along with it.combinations
to generate iterators for each window in my dataframe, and then plug that iterator into a new column of my dataframe. Something like:
df.rolling('3d').apply(lambda x: it.combinations(x, 2))
which gives a TypeError
:
TypeError: must be real number, not itertools.combinations
because df.rolling.apply
requires that its argument return a single real value, not an object, nor a list.
I also tried using it.combinations
directly on the rolling window:
it.combinations(df.rolling('3d'), 2)
which gives:
KeyError: 'Column not found: 0'
and if I select column a
explicitly:
it.combinations(df.rolling('3d')['a'], 2)
I get:
Exception: Column(s) a already selected
So is there maybe a way to define a function that I can call with df.apply
that plugs the iterator over my rolling window into a new column for each row of my dataframe? Can I even operate on rows other than the current row in a function passed to apply
?
Okay, this is a hack, but it might be useful.
All we want to do is reuse df.rolling's windowing facilities. We could try to look inside some non-public parts of the code, but instead let's just take advantage of the fact we can force a function call inside apply before we return a float:
In [28]: dummy = df.rolling("3d")["a"].apply((lambda x: print(x) or 0), raw=False)
2018-01-01 1.0
dtype: float64
2018-01-01 1.0
2018-01-02 2.0
dtype: float64
2018-01-01 1.0
2018-01-02 2.0
2018-01-03 3.0
dtype: float64
2018-01-03 3.0
2018-01-05 4.0
dtype: float64
2018-01-05 4.0
2018-01-06 5.0
dtype: float64
And so:
In [29]: roll_slices = []
In [30]: dummy = df.rolling("3d")["a"].apply((lambda x: roll_slices.append(list(combinations(x, 2))) or 0), raw=False)
In [31]: roll_slices
Out[31]:
[[],
[(1.0, 2.0)],
[(1.0, 2.0), (1.0, 3.0), (2.0, 3.0)],
[(3.0, 4.0)],
[(4.0, 5.0)]]
After which you can do what you like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With