Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join 2 dataframe on year and month in Pandas?

I have 2 dataframe and I want to join them on the basis of month and year from a date without creating extra columns:

example :

df1 :

date_1      value_1
2017-1-15    20
2017-1-31    30
2016-2-15    20

df2:

date_2      value_2
2017-1-1    30

then get result as below i.e. merged on the basis of month and year.

date_1      value_1 value_2
2017-1-15    20     30
2017-1-31    30     30
2016-2-15    20
like image 850
user3222101 Avatar asked Aug 23 '18 18:08

user3222101


2 Answers

Here's a rudimentary but effective solution:

res = pd.merge(df1.assign(grouper=df1['date_1'].dt.to_period('M')),
               df2.assign(grouper=df2['date_2'].dt.to_period('M')),
               how='left', on='grouper')

print(res)

      date_1  value_1 grouper     date_2  value_2
0 2017-01-15       20 2017-01 2017-01-01     30.0
1 2017-01-31       30 2017-01 2017-01-01     30.0
2 2016-02-15       20 2016-02        NaT      NaN

You can then remove unwanted columns:

res = res[['date_1', 'value_1', 'value_2']]
like image 115
jpp Avatar answered Oct 04 '22 22:10

jpp


Here is another way using lambda functions:

pd.merge(df,df2, left_on=df['date_1'].apply(lambda x: (x.year, x.month)),
         right_on=df2['date_2'].apply(lambda y: (y.year, y.month)),
         how='outer')[['date_1','value_1','value_2']]

    date_1      value_1   value_2
0   2017-01-15    20        30.0
1   2017-01-31    30        30.0
2   2016-02-15    20        NaN
like image 39
It_is_Chris Avatar answered Oct 04 '22 20:10

It_is_Chris