Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find missing dates in multiple date ranges for multi groups

Tags:

python

pandas

Im trying to extract missing date list for a date range in columns DATE FROM and DATE TO for multiple groups in column CURRENCY, the ranges is splited in multiple rows for each group column CURRENCY :

For example : Currency EUR have three date ranges in row 0, 1 and 2, and the missing range in the entire group is 2021-10-06 to 2021-10-10.

Missing ranges is highlited below using * symbol, This missing date range is what i desire in expected output.

df = pd.DataFrame({"CURRENCY":{"0":"EUR","1":"EUR","2":"EUR","3":"GBP","4":"GBP","5":"GBP","6":"USD","7":"USD","8":"SAR","9":"SAR"},
                   "DATE FROM":{"0":"2021-10-01","1":"2021-10-11","2":"2021-10-19","3":"2021-10-01","4":"2021-10-05",
                                "5":"2021-10-11","6":"2021-10-01","7":"2021-10-05","8":"2021-10-01","9":"2021-10-05"},
                   "DATE TO":{"0":"2021-10-05","1":"2021-10-18","2":"2021-10-23","3":"2021-10-04","4":"2021-10-07",
                              "5":"2021-10-18","6":"2021-10-02","7":"2021-10-10","8":"2021-10-01","9":"2021-10-10"}})

    CURRENCY    DATE FROM   DATE TO
0   EUR         2021-10-01  2021-10-05*
1   EUR         2021-10-11* 2021-10-18
2   EUR         2021-10-19  2021-10-23
3   GBP         2021-10-01  2021-10-04
4   GBP         2021-10-05  2021-10-07*
5   GBP         2021-10-11* 2021-10-18
6   USD         2021-10-01  2021-10-02*
7   USD         2021-10-05* 2021-10-10
8   SAR         2021-10-01  2021-10-01*
9   SAR         2021-10-05* 2021-10-10

Expected output:

    CURRENCY    MISSING
0   EUR         2021-10-06
1   EUR         2021-10-07
2   EUR         2021-10-08
3   EUR         2021-10-09
4   EUR         2021-10-10
5   GBP         2021-10-08
6   GBP         2021-10-09
7   GBP         2021-10-10
8   USD         2021-10-03
9   USD         2021-10-04
10  SAR         2021-10-02
11  SAR         2021-10-03
12  SAR         2021-10-04

Below is what i have tried, but it seems this is not efficient way to resolve, also im not sure how to group those output ranges to find the missing dates for each group (EUR,GBP...etc) :

date_from_list = df['DATE FROM'].to_list()
date_to_list = df['DATE TO'].to_list()
curr_list = df['CURRENCY'].to_list()

for date_from, date_to, curr in zip(date_from_list, date_to_list, curr_list):
    print(curr_list, pd.date_range(date_from, date_to))
like image 625
Ibrahim Ayoup Avatar asked Feb 23 '26 06:02

Ibrahim Ayoup


1 Answers

Both pd.date_range and pd.period_range can do. I have used date range which only has close either right or left, so you may have to do some filter. Code below

df= df.assign(end=df['DATE FROM'].shift(-1),start=df['DATE TO']).iloc[:-1 , :]#Define the start and end for date range

df=df.assign(Missing=df.apply(lambda x: pd.date_range(start=x['start'],  end=x['end'], closed='right').tolist(), axis = 1)).explode('Missing').drop_duplicates('Missing').drop(['start','end'],axis=1)

  CURRENCY   DATE FROM     DATE TO    Missing
0      EUR  2021-10-01  2021-10-05 2021-10-06
0      EUR  2021-10-01  2021-10-05 2021-10-07
0      EUR  2021-10-01  2021-10-05 2021-10-08
0      EUR  2021-10-01  2021-10-05 2021-10-09
0      EUR  2021-10-01  2021-10-05 2021-10-10
0      EUR  2021-10-01  2021-10-05 2021-10-11
1      EUR  2021-10-11  2021-10-18 2021-10-19
2      EUR  2021-10-19  2021-10-23        NaT
3      GBP  2021-10-01  2021-10-04 2021-10-05
6      USD  2021-10-01  2021-10-02 2021-10-03
6      USD  2021-10-01  2021-10-02 2021-10-04
8      SAR  2021-10-01  2021-10-01 2021-10-02
like image 168
wwnde Avatar answered Feb 24 '26 20:02

wwnde



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!