Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two dataframes based on a date between two other dates without a common column

Tags:

I have two dataframes that I need to merge based on whether or not a date value fits in between two other dates. Basically, I need to perform an outer join where B.event_date is between A.start_date and A.end_date. It seems that merge and join always assume a common column which in this case, I do not have.

    A                           B     start_date  end_date        event_date  price 0   2017-03-27  2017-04-20  0   2017-01-20  100 1   2017-01-10  2017-02-01  1   2017-01-27  200  Result      start_date  end_date        event_date  price 0   2017-03-27  2017-04-20   1   2017-01-10  2017-02-01      2017-01-20  100 2   2017-01-10  2017-02-01      2017-01-27  200 
like image 217
P Spence Avatar asked Apr 24 '17 16:04

P Spence


People also ask

Can you concat two Dataframes with different columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

Which are the 3 main ways of combining Dataframes together?

Combine data from multiple files into a single DataFrame using merge and concat. Combine two DataFrames using a unique ID found in both DataFrames. Employ to_csv to export a DataFrame in CSV format. Join DataFrames using common fields (join keys).


1 Answers

Create data and format to datetimes:

df_A = pd.DataFrame({'start_date':['2017-03-27','2017-01-10'],'end_date':['2017-04-20','2017-02-01']}) df_B = pd.DataFrame({'event_date':['2017-01-20','2017-01-27'],'price':[100,200]})  df_A['end_date'] = pd.to_datetime(df_A.end_date) df_A['start_date'] = pd.to_datetime(df_A.start_date) df_B['event_date'] = pd.to_datetime(df_B.event_date) 

Create keys to do a cross join:

New in pandas 1.2.0+ how='cross' instead of assigning psuedo keys:

df_merge = df_A.merge(df_B, how='cross') 

Else, with pandas < 1.2.0 use psuedo key to merge on 'key'

df_A = df_A.assign(key=1) df_B = df_B.assign(key=1) df_merge = pd.merge(df_A, df_B, on='key').drop('key',axis=1) 

Filter out records that do not meet criteria of event dates between start and end dates:

df_merge = df_merge.query('event_date >= start_date and event_date <= end_date') 

Join back to original date range table and drop key column

df_out = df_A.merge(df_merge, on=['start_date','end_date'], how='left').fillna('').drop('key', axis=1)  print(df_out) 

Output:

              end_date           start_date           event_date price 0  2017-04-20 00:00:00  2017-03-27 00:00:00                            1  2017-02-01 00:00:00  2017-01-10 00:00:00  2017-01-20 00:00:00   100 2  2017-02-01 00:00:00  2017-01-10 00:00:00  2017-01-27 00:00:00   200 
like image 120
Scott Boston Avatar answered Sep 23 '22 01:09

Scott Boston