Im trying to extract missing date list for a date range in columns DATE FROM and DATE TO for multiple groups in column CURRENCY, the ranges is splited in multiple rows for each group column CURRENCY :
For example : Currency EUR have three date ranges in row 0, 1 and 2, and the missing range in the entire group is 2021-10-06 to 2021-10-10.
Missing ranges is highlited below using * symbol, This missing date range is what i desire in expected output.
df = pd.DataFrame({"CURRENCY":{"0":"EUR","1":"EUR","2":"EUR","3":"GBP","4":"GBP","5":"GBP","6":"USD","7":"USD","8":"SAR","9":"SAR"},
"DATE FROM":{"0":"2021-10-01","1":"2021-10-11","2":"2021-10-19","3":"2021-10-01","4":"2021-10-05",
"5":"2021-10-11","6":"2021-10-01","7":"2021-10-05","8":"2021-10-01","9":"2021-10-05"},
"DATE TO":{"0":"2021-10-05","1":"2021-10-18","2":"2021-10-23","3":"2021-10-04","4":"2021-10-07",
"5":"2021-10-18","6":"2021-10-02","7":"2021-10-10","8":"2021-10-01","9":"2021-10-10"}})
CURRENCY DATE FROM DATE TO
0 EUR 2021-10-01 2021-10-05*
1 EUR 2021-10-11* 2021-10-18
2 EUR 2021-10-19 2021-10-23
3 GBP 2021-10-01 2021-10-04
4 GBP 2021-10-05 2021-10-07*
5 GBP 2021-10-11* 2021-10-18
6 USD 2021-10-01 2021-10-02*
7 USD 2021-10-05* 2021-10-10
8 SAR 2021-10-01 2021-10-01*
9 SAR 2021-10-05* 2021-10-10
Expected output:
CURRENCY MISSING
0 EUR 2021-10-06
1 EUR 2021-10-07
2 EUR 2021-10-08
3 EUR 2021-10-09
4 EUR 2021-10-10
5 GBP 2021-10-08
6 GBP 2021-10-09
7 GBP 2021-10-10
8 USD 2021-10-03
9 USD 2021-10-04
10 SAR 2021-10-02
11 SAR 2021-10-03
12 SAR 2021-10-04
Below is what i have tried, but it seems this is not efficient way to resolve, also im not sure how to group those output ranges to find the missing dates for each group (EUR,GBP...etc) :
date_from_list = df['DATE FROM'].to_list()
date_to_list = df['DATE TO'].to_list()
curr_list = df['CURRENCY'].to_list()
for date_from, date_to, curr in zip(date_from_list, date_to_list, curr_list):
print(curr_list, pd.date_range(date_from, date_to))
Both pd.date_range and pd.period_range can do. I have used date range which only has close either right or left, so you may have to do some filter. Code below
df= df.assign(end=df['DATE FROM'].shift(-1),start=df['DATE TO']).iloc[:-1 , :]#Define the start and end for date range
df=df.assign(Missing=df.apply(lambda x: pd.date_range(start=x['start'], end=x['end'], closed='right').tolist(), axis = 1)).explode('Missing').drop_duplicates('Missing').drop(['start','end'],axis=1)
CURRENCY DATE FROM DATE TO Missing
0 EUR 2021-10-01 2021-10-05 2021-10-06
0 EUR 2021-10-01 2021-10-05 2021-10-07
0 EUR 2021-10-01 2021-10-05 2021-10-08
0 EUR 2021-10-01 2021-10-05 2021-10-09
0 EUR 2021-10-01 2021-10-05 2021-10-10
0 EUR 2021-10-01 2021-10-05 2021-10-11
1 EUR 2021-10-11 2021-10-18 2021-10-19
2 EUR 2021-10-19 2021-10-23 NaT
3 GBP 2021-10-01 2021-10-04 2021-10-05
6 USD 2021-10-01 2021-10-02 2021-10-03
6 USD 2021-10-01 2021-10-02 2021-10-04
8 SAR 2021-10-01 2021-10-01 2021-10-02
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With