I have two dataframes with datetime index. <pre class="prettyprint"><code>import pandas as pd d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]} df1 = pd.DataFrame(d) df1.set_index(['dat'], inplace=True) df1.index = pd.to_datetime(df1.index) d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]} df2 = pd.DataFrame(d) df2.set_index(['dat'], inplace=True) df2.index = pd.to_datetime(df2.index) </code></pre> df1: <pre class="prettyprint"><code> x dat 2016-01-01 1 2016-01-02 2 2016-01-03 3 2017-01-01 4 2017-01-02 5 2017-01-03 6 </code></pre> df2: <pre class="prettyprint"><code> y dat 2016-01-01 10 2017-01-01 11 </code></pre> I would like to join them using only year and month parts of the index. So the output would look like following: df3: <pre class="prettyprint"><code> x y dat 2016-01-01 1 10 2016-01-02 2 10 2016-01-03 3 10 2017-01-01 4 11 2017-01-02 5 11 2017-01-03 6 11 </code></pre> I have tried to join them using <pre class="prettyprint"><code>df1.join(df2, how='inner') </code></pre> and I know that I can extract year and month parts like so: <pre class="prettyprint"><code>df1.index.map(lambda x: x.strftime('%Y-%m')) df2.index.map(lambda x: x.strftime('%Y-%m')) </code></pre> But I wonder how I can combine all these to achieve desired result? Many thanks

The information you want to merge on isn't explicitly defined anywhere. And there isn't a nice to way to keep your dates in the index when we merge without destroying it. So, we move the indices to the dataframe proper and create two new columns to merge on. Namely, <code>year</code> and <code>month</code>. I wrapped this part in a function to better see what's happening where. <pre class="prettyprint"><code>def f(df): df = df.reset_index() return df.assign(year=df.dat.dt.year, month=df.dat.dt.month) df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_']) df.set_index('dat')[['x', 'y']] x y dat 2016-01-01 1 10 2016-01-02 2 10 2016-01-03 3 10 2017-01-01 4 11 2017-01-02 5 11 2017-01-03 6 11 </code></pre> <hr> This is a different concept using <code>pd.Index.map</code> and <code>to_period</code>. Create a dictionary mapping from <code>df2</code> that translates the year/month period object to the corresponding value in column <code>y</code>. Then use <code>map</code> to map the period-ized dates in <code>df1.index</code> to the correct <code>y</code> values. <pre class="prettyprint"><code>m = dict(zip(df2.index.to_period('M'), df2.y)) df1.assign(y=df1.index.to_period('M').map(m.get)) x y dat 2016-01-01 1 10 2016-01-02 2 10 2016-01-03 3 10 2017-01-01 4 11 2017-01-02 5 11 2017-01-03 6 11 </code></pre> <hr> Setup <pre class="prettyprint"><code>dates1 = ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'] df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat')) dates2 = ['2016-01-01', '2017-01-01'] df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat')) </code></pre>

join dataframes using parts of datetime index

Tags:

python

datetime

pandas

I have two dataframes with datetime index.

import pandas as pd

d = {'dat': ['2016-01-01', '2016-01-02', '2016-01-03', '2017-01-01', '2017-01-02', '2017-01-03'],'x': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d)
df1.set_index(['dat'], inplace=True)
df1.index = pd.to_datetime(df1.index)

d = {'dat': ['2016-01-01', '2017-01-01'],'y': [10, 11]}
df2 = pd.DataFrame(d)
df2.set_index(['dat'], inplace=True)
df2.index = pd.to_datetime(df2.index)

df1:

            x
dat          
2016-01-01  1
2016-01-02  2
2016-01-03  3
2017-01-01  4
2017-01-02  5
2017-01-03  6

df2:

             y
dat           
2016-01-01  10
2017-01-01  11

I would like to join them using only year and month parts of the index. So the output would look like following:

df3:

            x  y
dat          
2016-01-01  1  10 
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

I have tried to join them using

df1.join(df2, how='inner')

and I know that I can extract year and month parts like so:

df1.index.map(lambda x: x.strftime('%Y-%m'))
df2.index.map(lambda x: x.strftime('%Y-%m'))

But I wonder how I can combine all these to achieve desired result?

Many thanks

276

asked Jun 20 '17 15:06

olyashevska

2 Answers

The information you want to merge on isn't explicitly defined anywhere. And there isn't a nice to way to keep your dates in the index when we merge without destroying it. So, we move the indices to the dataframe proper and create two new columns to merge on. Namely, year and month. I wrapped this part in a function to better see what's happening where.

def f(df):
    df = df.reset_index()
    return df.assign(year=df.dat.dt.year, month=df.dat.dt.month)

df = f(df1).merge(f(df2), on=['year', 'month'], suffixes=['', '_'])

df.set_index('dat')[['x', 'y']]

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

This is a different concept using pd.Index.map and to_period. Create a dictionary mapping from df2 that translates the year/month period object to the corresponding value in column y. Then use map to map the period-ized dates in df1.index to the correct y values.

m = dict(zip(df2.index.to_period('M'), df2.y))
df1.assign(y=df1.index.to_period('M').map(m.get))

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

Setup

dates1 = ['2016-01-01', '2016-01-02', '2016-01-03',
          '2017-01-01', '2017-01-02', '2017-01-03']
df1 = pd.DataFrame({'x': range(1, 7)}, pd.DatetimeIndex(dates1, name='dat'))

dates2 = ['2016-01-01', '2017-01-01']
df2 = pd.DataFrame({'y': [10, 11]}, pd.DatetimeIndex(dates2, name='dat'))

193

answered Oct 14 '22 07:10

piRSquared

You could use merge with assign on year and month from DateTimeIndex:

df3 = (df1.assign(year=df1.index.year, month=df1.index.month)
      .merge(df2.assign(year=df2.index.year, month=df2.index.month), on =['year','month'],right_index=True)
      .drop(['year','month'],axis=1))

Output:

            x   y
dat              
2016-01-01  1  10
2016-01-02  2  10
2016-01-03  3  10
2017-01-01  4  11
2017-01-02  5  11
2017-01-03  6  11

answered Oct 14 '22 08:10

Scott Boston

Related questions
                            
                                How to define default argument value based on previous arguments?
                            
                                Pandas reading NULL as a NaN float instead of str [duplicate]
                            
                                How can I upload a 'file' to S3 by creating a temp file, using AWS Lambda?
                            
                                How to get all combination from multiple lists?
                            
                                __call__ method of type class
                            
                                Scipy.optimize.curve_fit won't fit cosine power law
                            
                                Parsing a .proto file without creating the descriptor
                            
                                Save panda boxplot as image
                            
                                Pandas: Groupby to create table with count and count values
                            
                                How to create a title that will not appear in the toctree with Sphinx?
                            
                                simpler recursive code runs slower than iterative version of the same thing
                            
                                Population must be a sequence or set. For dicts, use list(d)
                            
                                PyPDF2 returning blank PDF after copy
                            
                                Python daemon threads and the "with" statement
                            
                                How to render a variable in a django template?
                            
                                Python opencv remove noise in image
                            
                                Kivy Python Right Click
                            
                                installing progressbar Python package
                            
                                Curl --data-binary equivalent in python-requests library
                            
                                Comparing slices in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With