How to select the rows that before a certain value in the columns first appear?
I have a dataset of user activity and their timestamp recorded as follow:
df = pd.DataFrame([{'user_id':1, 'date':'2017-09-01', 'activity':'Open'},
{'user_id':1, 'date':'2017-09-02', 'activity':'Open'}
{'user_id':1, 'date':'2017-09-03', 'activity':'Open'}
{'user_id':1, 'date':'2017-09-04', 'activity':'Click'}
{'user_id':1, 'date':'2017-09-05', 'activity':'Purchase'}
{'user_id':1, 'date':'2017-09-06', 'activity':'Open'}
{'user_id':1, 'date':'2017-09-07', 'activity':'Open'}
{'user_id':2, 'date':'2017-09-04', 'activity':'Open'}
{'user_id':2, 'date':'2017-09-06', 'activity':'Purchase'})]
Is there a way to select all the rows that happen before the first purchase happen for each user from the data frame? In this example, the desire output will be
df = pd.DataFrame([{'user_id':1, 'date':'2017-09-01', 'activity':'Open'},
{'user_id':1, 'date':'2017-09-02', 'activity':'Open'}
{'user_id':1, 'date':'2017-09-03', 'activity':'Open'}
{'user_id':1, 'date':'2017-09-04', 'activity':'Click'}
{'user_id':2, 'date':'2017-09-04', 'activity':'Open'})]
You can avoid explicit apply with
In [2862]: df[df['activity'].eq('Purchase').groupby(df['user_id']).cumsum().eq(0)]
Out[2862]:
activity date user_id
0 Open 2017-09-01 1
1 Open 2017-09-02 1
2 Open 2017-09-03 1
3 Click 2017-09-04 1
7 Open 2017-09-04 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With