Merge Pandas Dataframes based on date

Question

I'm looking to merge two pandas DataFrames based on date. The issue is the 2nd dataframe does not include every date from the 1st dataframe. I need to use every date from df1 with the latest value from df2.

+-------------+---------------+-------------+
| DataFrame 1 |               |             |
+-------------+---------------+-------------+
| Date        |  Sales loc1   |  Sales loc2 |
| 1/1/17      |  100          |  95         |
| 1/2/17      |  125          |  124        |
| 1/3/17      |  115          |  152        |
| ...         |               |             |
| 2/1/17      |  110          |  111        |
+-------------+---------------+-------------+


+-------------+---------+------+
| DataFrame 2 |         |      |
+-------------+---------+------+
| Date        |  exp    |  loc |
| 1/1/17      |  100    |  1   |
| 1/1/17      |  125    |  2   |
| 2/1/17      |  115    |  1   |
| 2/1/17      |  110    |  2   |
+-------------+---------+------+


+---------------+---------------+--------------+------------+-------------+
| New Dataframe |               |              |            |             |
+---------------+---------------+--------------+------------+-------------+
| Date          |  Sales loc1   |  Sales loc2  |  exp loc1  |  exp loc2   |
| 1/1/17        |  100          |  95          |  100       |  125        |
| 1/2/17        |  125          |  124         |  100       |  125        |
| 1/3/17        |  115          |  152         |  100       |  125        |
| ...           |               |              |            |             |
| 2/1/17        |  110          |  111         |  115       |  110        |
+---------------+---------------+--------------+------------+-------------+

The values from df2 will be used for multiple cells till there is a new value in df2.

Thanks a lot for your time.

cs95 · Accepted Answer

A generalised solution where there can be any number of rows for the same date in Date would involve,

First, merging df1 and df2 using merge
Next, using groupby + apply to flatten the dataframe
Finally, a little cleanup to fix the column names using rename and add_prefix

v = df1.merge(df2[['Date', 'exp']])\
       .groupby(df1.columns.tolist())\
       .exp\
       .apply(pd.Series.tolist)

df = pd.DataFrame(v.tolist(), index=v.index)\
       .rename(columns=lambda x: x + 1)\
       .add_prefix('exp loc')\
       .reset_index()

df

     Date  Sales loc1  Sales loc2  exp loc1  exp loc2
0  1/1/17         100          95       100       125
1  2/1/17         110         111       115       110

Here's another solution that should work nicely if you only have two (or, in general, exactly N) sets of rows per Date in df2.

n = 2
v = pd.DataFrame(
     df2.exp.values.reshape(-1, n), 
     index=df2.Date.unique(), 
     columns=range(1, n + 1)
).add_prefix('exp loc')\
 .rename_axis('Date')\
 .reset_index()

Now, it's just a simple merge with df1 on Date.

df1.merge(v, on='Date')

     Date  Sales loc1  Sales loc2  exp loc1  exp loc2
0  1/1/17         100          95       100       125
1  2/1/17         110         111       115       110

Or, as @A. Leistra pointed out, you might want a different sort of result using a left outer merge:

df1.merge(v, how='left', on='Date').ffill()

     Date  Sales loc1  Sales loc2  exp loc1  exp loc2
0  1/1/17         100          95     100.0     125.0
1  1/2/17         125         124     100.0     125.0
2  1/3/17         115         152     100.0     125.0
3  2/1/17         110         111     115.0     110.0

Merge Pandas Dataframes based on date

Tags:

python

pandas

dataframe

user3029296

1 Answers

cs95

Recent Activity

Donate For Us

Merge Pandas Dataframes based on date

Tags:

python

pandas

dataframe

user3029296

1 Answers

cs95

Related questions

Recent Activity

Donate For Us