I have a DataFrame
with a continuous measure, marked by occasional events:
TimeIndex Event Value
0 NaN 4.099969
1 NaN 3.833528
2 NaN -1.335025
3 A 4.420085
4 NaN 4.508899
5 NaN 4.557383
6 B -3.377152
7 NaN 4.508899
8 NaN -1.919803
9 A 2.18520
10 NaN 3.821221
11 C 0.922389
12 NaN 2.165784
I want the average for each event, but also the average two time points before and two time points after the event occurs. Something like this might work:
TimeIndex Event Value Around_A Around_B Around_C
0 NaN 4.099969 NaN NaN NaN
1 NaN 3.833528 -2 NaN NaN
2 NaN -1.335025 -1 NaN NaN
3 A 4.420085 0 NaN NaN
4 NaN 4.508899 1 -2 NaN
5 NaN 4.557383 2 -1 NaN
6 B -3.377152 NaN 0 NaN
7 NaN 4.508899 -2 1 NaN
8 NaN -1.919803 -1 2 NaN
9 A 2.18520 0 NaN 2
10 NaN 3.821221 1 NaN -1
11 C 0.922389 2 NaN 0
12 NaN 2.165784 NaN NaN 1
However: 1) I'm unsure how to get the new column values without looping and 2) appending a new column gets intractable for many different events (which I have)
Is there an easier way to select timepoints/rows around a value in pandas, and then average by time point/row?
My desired output is the average Value for Event x AroundTime (dummy means shown here)
Event AroundTime Value.mean
A -2 3.35
A -1 0.19
A 0 2.33
A 1 -1.01
A 2 3.78
B -2 4.53
B -1 4.22
B 0 5.14
B 1 1.88
B 2 0.70
C -2 -1.01
C -1 -2.33
C 0 1.69
C 1 1.19
C 2 2.21
I will suggest:
In [26]:
print df
TimeIndex Event Value
0 0 NaN 4.099969
1 1 NaN 3.833528
2 2 NaN -1.335025
3 3 A 4.420085
4 4 NaN 4.508899
5 5 NaN 4.557383
6 6 B -3.377152
7 7 NaN 4.508899
8 8 NaN -1.919803
9 9 A 2.185200
10 10 NaN 3.821221
11 11 C 0.922389
12 12 NaN 2.165784
[13 rows x 3 columns]
In [27]:
df['Around_A']=np.nan
In [28]:
for i in range(-2,3):
df['Around_A'][(df.Event=='A').shift(i).fillna(False)]=i
#or df.ix[(df.Event=='A').shift(i).fillna(False), 'Around_A']=i
In [29]:
print df
TimeIndex Event Value Around_A
0 0 NaN 4.099969 NaN
1 1 NaN 3.833528 -2
2 2 NaN -1.335025 -1
3 3 A 4.420085 0
4 4 NaN 4.508899 1
5 5 NaN 4.557383 2
6 6 B -3.377152 NaN
7 7 NaN 4.508899 -2
8 8 NaN -1.919803 -1
9 9 A 2.185200 0
10 10 NaN 3.821221 1
11 11 C 0.922389 2
12 12 NaN 2.165784 NaN
[13 rows x 4 columns]
Don't quite get your last question, mind provide an intended result?
now it is clear, my approach:
In [22]:
df=pd.read_clipboard()
df['Around_A']=np.nan
df['Around_B']=np.nan
df['Around_C']=np.nan
for i in range(-2,3):
df.ix[(df.Event=='A').shift(i).fillna(False), 'Around_A']=i
df.ix[(df.Event=='B').shift(i).fillna(False), 'Around_B']=i
df.ix[(df.Event=='C').shift(i).fillna(False), 'Around_C']=i
Data=[]
for s in ['A', 'B', 'C']:
_df=pd.DataFrame(df.groupby('Around_%s'%s).Value.mean())
_df['Event']=s
_df.index.name='AroundTime'
Data.append(_df.reset_index())
print pd.concat(Data)[['Event', 'AroundTime', 'Value']]
Event AroundTime Value
0 A -2 4.171213
1 A -1 -1.627414
2 A 0 3.302643
3 A 1 4.165060
4 A 2 2.739886
0 B -2 4.508899
1 B -1 4.557383
2 B 0 -3.377152
3 B 1 4.508899
4 B 2 -1.919803
0 C -2 2.185200
1 C -1 3.821221
2 C 0 0.922389
3 C 1 2.165780
[14 rows x 3 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With