I'm looking to merge two pandas DataFrames based on date. The issue is the 2nd dataframe does not include every date from the 1st dataframe. I need to use every date from df1
with the latest value from df2
.
+-------------+---------------+-------------+
| DataFrame 1 | | |
+-------------+---------------+-------------+
| Date | Sales loc1 | Sales loc2 |
| 1/1/17 | 100 | 95 |
| 1/2/17 | 125 | 124 |
| 1/3/17 | 115 | 152 |
| ... | | |
| 2/1/17 | 110 | 111 |
+-------------+---------------+-------------+
+-------------+---------+------+
| DataFrame 2 | | |
+-------------+---------+------+
| Date | exp | loc |
| 1/1/17 | 100 | 1 |
| 1/1/17 | 125 | 2 |
| 2/1/17 | 115 | 1 |
| 2/1/17 | 110 | 2 |
+-------------+---------+------+
+---------------+---------------+--------------+------------+-------------+
| New Dataframe | | | | |
+---------------+---------------+--------------+------------+-------------+
| Date | Sales loc1 | Sales loc2 | exp loc1 | exp loc2 |
| 1/1/17 | 100 | 95 | 100 | 125 |
| 1/2/17 | 125 | 124 | 100 | 125 |
| 1/3/17 | 115 | 152 | 100 | 125 |
| ... | | | | |
| 2/1/17 | 110 | 111 | 115 | 110 |
+---------------+---------------+--------------+------------+-------------+
The values from df2 will be used for multiple cells till there is a new value in df2.
Thanks a lot for your time.
A generalised solution where there can be any number of rows for the same date in Date
would involve,
df1
and df2
using merge
groupby
+ apply
to flatten the dataframerename
and add_prefix
v = df1.merge(df2[['Date', 'exp']])\
.groupby(df1.columns.tolist())\
.exp\
.apply(pd.Series.tolist)
df = pd.DataFrame(v.tolist(), index=v.index)\
.rename(columns=lambda x: x + 1)\
.add_prefix('exp loc')\
.reset_index()
df
Date Sales loc1 Sales loc2 exp loc1 exp loc2
0 1/1/17 100 95 100 125
1 2/1/17 110 111 115 110
Here's another solution that should work nicely if you only have two (or, in general, exactly N) sets of rows per Date
in df2
.
n = 2
v = pd.DataFrame(
df2.exp.values.reshape(-1, n),
index=df2.Date.unique(),
columns=range(1, n + 1)
).add_prefix('exp loc')\
.rename_axis('Date')\
.reset_index()
Now, it's just a simple merge with df1
on Date
.
df1.merge(v, on='Date')
Date Sales loc1 Sales loc2 exp loc1 exp loc2
0 1/1/17 100 95 100 125
1 2/1/17 110 111 115 110
Or, as @A. Leistra pointed out, you might want a different sort of result using a left outer merge:
df1.merge(v, how='left', on='Date').ffill()
Date Sales loc1 Sales loc2 exp loc1 exp loc2
0 1/1/17 100 95 100.0 125.0
1 1/2/17 125 124 100.0 125.0
2 1/3/17 115 152 100.0 125.0
3 2/1/17 110 111 115.0 110.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With