Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows around a value in Pandas

Tags:

python

pandas

I have a DataFrame with a continuous measure, marked by occasional events:

TimeIndex  Event    Value
0          NaN     4.099969  
1          NaN     3.833528
2          NaN     -1.335025
3          A       4.420085
4          NaN     4.508899
5          NaN     4.557383
6          B       -3.377152
7          NaN     4.508899
8          NaN     -1.919803
9          A       2.18520
10         NaN     3.821221
11         C       0.922389
12         NaN     2.165784

I want the average for each event, but also the average two time points before and two time points after the event occurs. Something like this might work:

TimeIndex  Event    Value      Around_A  Around_B  Around_C
0          NaN     4.099969      NaN       NaN       NaN
1          NaN     3.833528      -2        NaN       NaN
2          NaN     -1.335025     -1        NaN       NaN
3          A       4.420085       0        NaN       NaN
4          NaN     4.508899       1        -2        NaN
5          NaN     4.557383       2        -1        NaN
6          B       -3.377152     NaN        0        NaN
7          NaN     4.508899      -2         1        NaN
8          NaN     -1.919803     -1         2        NaN
9          A       2.18520        0        NaN        2
10         NaN     3.821221       1        NaN       -1
11         C       0.922389       2        NaN        0
12         NaN     2.165784      NaN       NaN        1

However: 1) I'm unsure how to get the new column values without looping and 2) appending a new column gets intractable for many different events (which I have)

Is there an easier way to select timepoints/rows around a value in pandas, and then average by time point/row?

My desired output is the average Value for Event x AroundTime (dummy means shown here)

Event AroundTime  Value.mean
A         -2      3.35
A         -1      0.19
A          0      2.33
A          1     -1.01
A          2      3.78
B         -2      4.53
B         -1      4.22
B          0      5.14
B          1      1.88
B          2      0.70
C         -2     -1.01
C         -1     -2.33
C          0      1.69
C          1      1.19
C          2      2.21
like image 587
Amyunimus Avatar asked Oct 01 '22 12:10

Amyunimus


1 Answers

I will suggest:

In [26]:

print df
    TimeIndex Event     Value
0           0   NaN  4.099969
1           1   NaN  3.833528
2           2   NaN -1.335025
3           3     A  4.420085
4           4   NaN  4.508899
5           5   NaN  4.557383
6           6     B -3.377152
7           7   NaN  4.508899
8           8   NaN -1.919803
9           9     A  2.185200
10         10   NaN  3.821221
11         11     C  0.922389
12         12   NaN  2.165784

[13 rows x 3 columns]
In [27]:

df['Around_A']=np.nan
In [28]:

for i in range(-2,3):
    df['Around_A'][(df.Event=='A').shift(i).fillna(False)]=i
    #or df.ix[(df.Event=='A').shift(i).fillna(False), 'Around_A']=i
In [29]:

print df
    TimeIndex Event     Value  Around_A
0           0   NaN  4.099969       NaN
1           1   NaN  3.833528        -2
2           2   NaN -1.335025        -1
3           3     A  4.420085         0
4           4   NaN  4.508899         1
5           5   NaN  4.557383         2
6           6     B -3.377152       NaN
7           7   NaN  4.508899        -2
8           8   NaN -1.919803        -1
9           9     A  2.185200         0
10         10   NaN  3.821221         1
11         11     C  0.922389         2
12         12   NaN  2.165784       NaN

[13 rows x 4 columns]

Don't quite get your last question, mind provide an intended result?

Edit

now it is clear, my approach:

In [22]:

df=pd.read_clipboard()
df['Around_A']=np.nan
df['Around_B']=np.nan
df['Around_C']=np.nan
for i in range(-2,3):
    df.ix[(df.Event=='A').shift(i).fillna(False), 'Around_A']=i
    df.ix[(df.Event=='B').shift(i).fillna(False), 'Around_B']=i
    df.ix[(df.Event=='C').shift(i).fillna(False), 'Around_C']=i
Data=[]
for s in ['A', 'B', 'C']:
    _df=pd.DataFrame(df.groupby('Around_%s'%s).Value.mean())
    _df['Event']=s
    _df.index.name='AroundTime'
    Data.append(_df.reset_index())
print pd.concat(Data)[['Event', 'AroundTime', 'Value']]
  Event  AroundTime     Value
0     A          -2  4.171213
1     A          -1 -1.627414
2     A           0  3.302643
3     A           1  4.165060
4     A           2  2.739886
0     B          -2  4.508899
1     B          -1  4.557383
2     B           0 -3.377152
3     B           1  4.508899
4     B           2 -1.919803
0     C          -2  2.185200
1     C          -1  3.821221
2     C           0  0.922389
3     C           1  2.165780

[14 rows x 3 columns] 
like image 175
CT Zhu Avatar answered Oct 03 '22 07:10

CT Zhu