Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using multiple masks based on ID

I have a dataframe df that consists of two columns: an id, and a date. The id is a number from 1-3 & is not unique; the date is a datetime object.

id, date
1, 2020-5-11
1, 2019-3-2
2, 2018-7-29
3, 2017-2-21
3, 2017-2-23

There also exists a corresponding list which contains the id with a linked start and end date:

id, startdate, enddate
1, 2015-1-1, 2020-1-1
2, 2019-2-1, 2020-2-31
3, 2017-2-1, 2020-2-31

I wish to filter df to exclude all rows where for each id the date is outwith the range of the startdate and enddate as specified in the above list.

I can do it per id, using the below code; but I do not know how to do them all simultaneously.

for each, startdate, enddate in ids:
    mask = (df['date'].dt.date >= startdate) & (df['date'].dt.date <= enddate)
    df[df['id']==each].loc[mask]

The desired output for the dummy example would be a dataframe containing the following (i.e. excluding all rows for which the date is outwith the date range specific for each id)

id, date
1, 2019-3-2
3, 2017-2-21
3, 2017-2-23
like image 832
NotLost Avatar asked Dec 22 '25 13:12

NotLost


1 Answers

A straightforward and efficient approach is to use map to assign the start/end date to each id in df1['id'], then to perform boolean indexing with between:

tmp = df2.set_index('id')

out = df1[df1['date'].between(df1['id'].map(tmp['startdate']),
                              df1['id'].map(tmp['enddate']))]

Output:

   id       date
1   1 2019-03-02
3   3 2017-02-21
4   3 2017-02-23

Timings:

pandas select between start end per group timings

NB. the merge approach only works if df1 has a range index ([0, 1, 2, ...]).

The timings seem to be independent from the size of df2. Here with 100k rows for df1 and between 1 and 100 rows for df2:

pandas select between start end per group

like image 72
mozway Avatar answered Dec 24 '25 02:12

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!