Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

join dataframes using parts of datetime index

I have two dataframes with datetime index.

import pandas as pd

d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)

d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)

df1:

            x
dat          
2016-01-01  1
2016-01-02  2
2016-01-03  3
2017-01-01  4
2017-01-02  5
2017-01-03  6

df2:

             y
dat           
2016-01-01  10
2017-01-01  11

I would like to join them using only year and month parts of the index. So the output would look like following:

df3:

            x  y
dat          
2016-01-01  1  10 
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

I have tried to join them using

df1.join(df2, how='inner')

and I know that I can extract year and month parts like so:

df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))

But I wonder how I can combine all these to achieve desired result?

Many thanks

like image 276
olyashevska Avatar asked Jun 20 '17 15:06

olyashevska


People also ask

Which are the 3 main ways of combining DataFrames together?

Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame. to_csv can be used to write out DataFrames in CSV format.

How do you join two tables using data frames?

Merge DataFrames Two DataFrames might hold different kinds of information about the same entity and linked by some common feature/column. To join these DataFrames, pandas provides multiple functions like concat() , merge() , join() , etc. In this section, you will practice using merge() function of pandas.


2 Answers

The information you want to merge on isn't explicitly defined anywhere. And there isn't a nice to way to keep your dates in the index when we merge without destroying it. So, we move the indices to the dataframe proper and create two new columns to merge on. Namely, year and month. I wrapped this part in a function to better see what's happening where.

def f(df):
    df = df.reset_index()
    return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)

df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])

df.set_index('dat')[['x', 'y']]

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

This is a different concept using pd.Index.map and to_period. Create a dictionary mapping from df2 that translates the year/month period object to the corresponding value in column y. Then use map to map the period-ized dates in df1.index to the correct y values.

m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

Setup

dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
          '2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))

dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))
like image 193
piRSquared Avatar answered Oct 14 '22 07:10

piRSquared


You could use merge with assign on year and month from DateTimeIndex:

df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
      .merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
      .drop(['year','month'],axis=1))

Output:

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11
like image 43
Scott Boston Avatar answered Oct 14 '22 08:10

Scott Boston