I have two dataframes, "bio" with monthly data, where I only need the column bio['Bio_Avg']. And another one "tc" with daily data. Both with Datetime index.
What I need is to merge both keeping daily granularity but adding monthly data repeating the value of the column bio['Bio_Avg'].
print(bio.head(5))
year month ncena ... contingency D Bio_Avg
Date ...
2019-12-01 2019 12 NaN ... NaN 1 38826.48
2019-11-01 2019 11 NaN ... NaN 1 33867.68
2019-10-01 2019 10 NaN ... NaN 1 31358.80
2019-09-01 2019 9 NaN ... NaN 1 29583.84
2019-08-01 2019 8 NaN ... NaN 1 27763.12
and
print(tc.head(5))
exchg_rate
2019-11-01 59.71
2019-11-02 59.74
2019-11-03 59.73
2019-11-04 59.70
2019-11-05 59.65
The expected result should be:
exchg_rate Bio_Avg
2019-11-01 59.71 33867.68
2019-11-02 59.74 33867.68
2019-11-03 59.73 33867.68
2019-11-04 59.70 33867.68
2019-11-05 59.65 33867.68
I've tried:
merge = pd.merge(tc, bio, left_index=True, right_index=True)
print(merge.head(5))
And I get:
exchg_rate year month ... contingency D Bio_Avg
2019-11-01 59.75 2019 11 ... NaN 1 33867.68
2019-12-01 59.94 2019 12 ... NaN 1 38826.48
Off course it keeps only the exact match with the date, but I loose the days in between. What could be an elegant way to solve this without breaking all dataframes?
first we will reset the index on df2 and then merge on your month column, if the month column wasn't present we could use df['Date'].dt.month
#df1 = bio
#df2 = tc
#df2.reset_index(inplace=True)
df2 = df2.reset_index()
merged_df = pd.merge(
df2,
df[["month", "Bio_Avg"]],
left_on=df2['Date'].dt.month,
right_on="month"
).drop("month", axis=1)
print(merged_df)
Date exchg_rate Bio_Avg
0 2019-11-01 59.71 33867.68
1 2019-11-02 59.74 33867.68
2 2019-11-03 59.73 33867.68
3 2019-11-04 59.70 33867.68
4 2019-11-05 59.65 33867.68
Edit:
working with multiple columns to join on, instead of of using your pre-built month + year column, we can extract them from the actual Date
column ensure it's a date time by using pd.to_datetime(your_dfs['Date'])
final =pd.merge(
bio[["Bio_Avg", "Date"]],
tc,
left_on=[bio["Date"].dt.year, bio["Date"].dt.month],
right_on=[tc["Date"].dt.year, tc["Date"].dt.month],
how="right",
suffixes=("bio", "tc_"),
).drop(["key_0", "key_1"], axis=1)
print(final)
Bio_Avg Datebio Datetc_ exchg_rate
0 33867.68 2019-11-01 2019-11-01 59.71
1 33867.68 2019-11-01 2019-11-02 59.74
2 33867.68 2019-11-01 2019-11-03 59.73
3 33867.68 2019-11-01 2019-11-04 59.70
4 33867.68 2019-11-01 2019-11-05 59.65
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With