I have two dataframes with datetime index.
import pandas as pd
d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)
d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)
df1:
x
dat
2016-01-01 1
2016-01-02 2
2016-01-03 3
2017-01-01 4
2017-01-02 5
2017-01-03 6
df2:
y
dat
2016-01-01 10
2017-01-01 11
I would like to join them using only year and month parts of the index. So the output would look like following:
df3:
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
I have tried to join them using
df1.join(df2, how='inner')
and I know that I can extract year and month parts like so:
df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))
But I wonder how I can combine all these to achieve desired result?
Many thanks
Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.
Merge DataFrames Two DataFrames might hold different kinds of information about the same entity and linked by some common feature/column. To join these DataFrames, pandas provides multiple functions like concat() , merge() , join() , etc. In this section, you will practice using merge() function of pandas.
The information you want to merge on isn't explicitly defined anywhere. And there isn't a nice to way to keep your dates in the index when we merge without destroying it. So, we move the indices to the dataframe proper and create two new columns to merge on. Namely, year
and month
. I wrapped this part in a function to better see what's happening where.
def f(df):
df = df.reset_index()
return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)
df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])
df.set_index('dat')[['x', 'y']]
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
This is a different concept using pd.Index.map
and to_period
. Create a dictionary mapping from df2
that translates the year/month period object to the corresponding value in column y
. Then use map
to map the period-ized dates in df1.index
to the correct y
values.
m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
Setup
dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
'2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))
dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))
You could use merge
with assign
on year
and month
from DateTimeIndex:
df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
.merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
.drop(['year','month'],axis=1))
Output:
x y
dat
2016-01-01 1 10
2016-01-02 2 10
2016-01-03 3 10
2017-01-01 4 11
2017-01-02 5 11
2017-01-03 6 11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With