Here is an example DataFrame which I will use to better illustrate my question:
import pandas as pd
df = pd.DataFrame(pd.np.random.rand(30, 3), columns=tuple('ABC'))
df['event'] = pd.np.nan
df.loc[10, 'event'] = 'ping'
df.loc[20, 'event'] = 'ping'
df.loc[19, 'event'] = 'pong'
I need to create windows of n rows centered around each occurrence of ping
.
In other words, let i
be the index of a row that contains ping
in the event
column. For each i
, I want to select df.ix[i-n:i+n]
.
Thus, for n=3
, I would expect the following result:
A B C event
7 0.8295863 0.2162861 0.4856461 NaN
8 0.156646 0.4730667 0.9968878 NaN
9 0.6709413 0.4796197 0.8747416 NaN
10 0.09942329 0.154008 0.5761598 ping
11 0.7168143 0.678207 0.7281105 NaN
12 0.8915475 0.8013187 0.9049722 NaN
13 0.9545411 0.4844835 0.1645746 NaN
17 0.9909208 0.1091025 0.6582635 NaN
18 0.2536326 0.4324749 0.8001643 NaN
19 0.4734659 0.5582809 0.1221296 pong
20 0.7230407 0.6695843 0.3902591 ping
21 0.3624909 0.2685049 0.5484445 NaN
22 0.05626284 0.6113877 0.9131929 NaN
23 0.8312294 0.5694373 0.4325798 NaN
[14 rows x 4 columns]
A few caveats:
pong
value around which we do not want to center a window. It is captured in the result of centering around the second ping
, however.How can this be achieved?
In [17]: n = 3
Select an indexer that is the range of what you need, e.g. the target index +- 3 (subject to the max/min of the size of the frame). Concatenate them all, and eliminate dups.
In [18]: indexers = np.unique(np.concatenate([ np.arange(max(i-n,0),min(i+n,len(df))) for i in df[df.event=='ping'].index ]))
In [19]: indexers
Out[19]: array([ 7, 8, 9, 10, 11, 12, 17, 18, 19, 20, 21, 22])
Select them.
In [20]: df.iloc[indexers]
Out[20]:
A B C event
7 0.03348742 0.05735324 0.1220022 NaN
8 0.9567363 0.6539097 0.8409577 NaN
9 0.3115902 0.4955503 0.1749197 NaN
10 0.6883777 0.6185107 0.7933182 ping
11 0.5185129 0.6533616 0.1569159 NaN
12 0.1196976 0.9638604 0.7318006 NaN
17 0.02897615 0.1224485 0.5706852 NaN
18 0.02409971 0.4715463 0.4587161 NaN
19 0.9070592 0.3371241 0.9543977 pong
20 0.8533369 0.7549413 0.5334882 ping
21 0.9546738 0.8203931 0.8543028 NaN
22 0.05691086 0.2402766 0.3922318 NaN
Note that you might need to do a df.reset_index()
(before you select to get the actual row index position, rather than a value).
Note that their is a bug here as the setting of the 'event' column converts everything to object, see here. You can alleviate by using df.convert_objects()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With