I have two dataframes with datetime index.
import pandas as pd
d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)
d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)
df1:
            x
dat          
2016-01-01  1
2016-01-02  2
2016-01-03  3
2017-01-01  4
2017-01-02  5
2017-01-03  6
df2:
             y
dat           
2016-01-01  10
2017-01-01  11
I would like to join them using only year and month parts of the index. So the output would look like following:
df3:
            x  y
dat          
2016-01-01  1  10 
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11
I have tried to join them using
df1.join(df2, how='inner')
and I know that I can extract year and month parts like so:
df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))
But I wonder how I can combine all these to achieve desired result?
Many thanks
Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.
Merge DataFrames Two DataFrames might hold different kinds of information about the same entity and linked by some common feature/column. To join these DataFrames, pandas provides multiple functions like concat() , merge() , join() , etc. In this section, you will practice using merge() function of pandas.
The information you want to merge on isn't explicitly defined anywhere.  And there isn't a nice to way to keep your dates in the index when we merge without destroying it.  So, we move the indices to the dataframe proper and create two new columns to merge on.  Namely, year and month.  I wrapped this part in a function to better see what's happening where.
def f(df):
    df = df.reset_index()
    return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)
df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])
df.set_index('dat')[['x', 'y']]
            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11
This is a different concept using pd.Index.map and to_period.  Create a dictionary mapping from df2 that translates the year/month period object to the corresponding value in column y.  Then use map to map the period-ized dates in df1.index to the correct y values.
m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))
            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11
Setup
dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
          '2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))
dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))
                        You could use merge with assign on year and month from DateTimeIndex:
df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
      .merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
      .drop(['year','month'],axis=1))
Output:
            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With