Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I filter rows that come before the row that contains certain value for each group in dataframe

How do I get only the rows that come after the 'click' in the column 'action_type' for each client_id the toy data.

df = pd.DataFrame({
  'user_client_id': [1,1, 1, 1, 1,1, 1,1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
   'timestamp':['2021-12-18 09:15:59', '2021-12-18 10:33:49', '2021-12-18 10:34:08',
'2021-12-18 10:34:09', '2021-12-18 10:57:02','2021-12-18 10:57:33','2021-12-18 10:58:01','2021-12-18 10:58:02','2021-12-18 10:58:17',
'2021-12-18 10:58:29','2021-12-18 10:58:31','2021-12-18 10:58:34', '2021-12-18 10:58:34','2021-12-18 10:58:47', '2021-12-18 10:59:12',
'2021-12-18 10:59:28','2021-12-18 10:59:35','2021-12-18 10:59:38','2021-12-18 11:05:13', '2021-12-18 11:05:58','2021-12-18 11:06:08','2021-12-18 11:06:10','2021-12-18 11:06:12','2021-12-18 11:07:42',
 '2021-12-18 11:10:07','2021-12-18 11:10:23', '2021-12-18 11:10:53', '2021-12-18 11:10:58', '2021-12-18 11:13:04', '2021-12-18 11:13:06',
'2021-12-18 14:56:32','2021-12-18 17:16:40'],
'action_type ': ['to_cart','to_cart','to_cart','to_cart','click', 'to_cart', 'to_cart', 'increment', 'remove', 'to_cart', 'increment', 'click', 'to_cart', 'increment', 'to_cart', 'to_cart', 'remove', 'to_cart', 'increment', 'to_cart', 'to_cart', 'click', 'increment',
 'to_cart', 'to_cart', 'to_cart', 'click', 'increment', 'to_cart', 'increment', 'to_cart', 'increment'] })

For the client with id 1 everything that comes before the click at 2021-12-18 10:57:02 should be filtered for the client with id 2 everything that comes before the click at 2021-12-18 11:06:10 should be filtered

I've tried this way, but it only works for the client 1, but doesn't work for the client 2

df.iloc[df.loc[df['action_type']=='click'].index[0]:,:]
like image 652
bigsbi Avatar asked Dec 26 '21 21:12

bigsbi


1 Answers

You can use a mask with groupby and cummax. This will set all values per group to True after the first "click'

m = (df['action_type'].eq('click')
       .groupby(df['user_client_id'])
       .cummax()
     )

df[m]

Output:

    user_client_id            timestamp action_type
4                1  2021-12-18 10:57:02       click
5                1  2021-12-18 10:57:33     to_cart
6                1  2021-12-18 10:58:01     to_cart
7                1  2021-12-18 10:58:02   increment
8                1  2021-12-18 10:58:17      remove
9                1  2021-12-18 10:58:29     to_cart
10               1  2021-12-18 10:58:31   increment
11               1  2021-12-18 10:58:34       click
12               1  2021-12-18 10:58:34     to_cart
13               1  2021-12-18 10:58:47   increment
14               1  2021-12-18 10:59:12     to_cart
21               2  2021-12-18 11:06:10       click
22               2  2021-12-18 11:06:12   increment
23               2  2021-12-18 11:07:42     to_cart
24               2  2021-12-18 11:10:07     to_cart
25               2  2021-12-18 11:10:23     to_cart
26               2  2021-12-18 11:10:53       click
27               2  2021-12-18 11:10:58   increment
28               2  2021-12-18 11:13:04     to_cart
29               2  2021-12-18 11:13:06   increment
30               2  2021-12-18 14:56:32     to_cart
31               2  2021-12-18 17:16:40   increment
like image 78
mozway Avatar answered Nov 15 '22 09:11

mozway