Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two DataFrames based on columns and values of a specific column with Pandas in Python 3.x

Hello i have a problem which i am not able to implement a solution on. I have following two DataFrames:

>>> df1
A  B   date
1  1  01-2016
2  1  02-2017
1  2  03-2017
2  2  04-2020

>>> df2
A  B  01-2016  02-2017  03-2017  04.2020
1  1    0.10    0.22     0.55     0.77
2  1    0.20    0.12     0.99     0.125
1  2    0.13    0.15     0.15     0.245
2  2    0.33    0.1      0.888    0.64

What i want is following DataFrame:

>>> df3
A  B   date      value
1  1  01-2016    0.10
2  1  02-2017    0.12
1  2  03-2017    0.15
2  2  04-2020    0.64

I already tried following:

        summarize_dates = self.summarize_specific_column(data=df1, column='date')

        for date in summarize_dates:
            left_on = np.append(left_on, date)
            right_on = np.append(right_on, merge_columns.upper())
            result = pd.merge(left=df2, right=df1,
                              left_on=left_on, right_on=right_on,
                              how='right')
            print(result)

This does not work. Can you help me and suggest a more comfortable implementation? Manyy thanks in advance!

like image 821
Michael Gann Avatar asked Nov 08 '18 19:11

Michael Gann


People also ask

How do you join two pandas Dataframes using the common column of both Dataframes which function can be used?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.


2 Answers

Using lookup

df1['value']=df2.set_index(['A','B']).lookup(df1.set_index(['A','B']).index,df1.date)
df1
Out[228]: 
   A  B     date  value
0  1  1  01-2016   0.10
1  2  1  02-2017   0.12
2  1  2  03-2017   0.15
3  2  2  04-2020   0.64
like image 44
BENY Avatar answered Sep 21 '22 02:09

BENY


You can melt df2 and then merge using the default 'inner' merge

df3 = df1.merge(df2.melt(id_vars = ['A', 'B'], var_name='date'))

    A   B   date    value
0   1   1   01-2016 0.10
1   2   1   02-2017 0.12
2   1   2   03-2017 0.15
3   2   2   04-2020 0.64
like image 152
Vaishali Avatar answered Sep 19 '22 02:09

Vaishali