Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple dataframes based on a common column [duplicate]

I have Three dataframes. All of them have a common column and I need to merge them based on the common column without missing any data

Input

>>>df1
0 Col1  Col2  Col3
1 data1  3      4
2 data2  4      3
3 data3  2      3
4 data4  2      4
5 data5  1      4

>>>df2
0 Col1  Col4  Col5
1 data1  7      4
2 data2  6      9
3 data3  1      4

>>>df3
0 Col1  Col6  Col7
1 data2  5      8
2 data3  2      7
3 data5  5      3

Expected Output

>>>df
0 Col1  Col2  Col3  Col4 Col5  Col6  Col7
1 data1  3      4    7    4
2 data2  4      3    6    9     5     8
3 data3  2      3    1    4     2     7
4 data4  2      4
5 data5  1      4               5     3
like image 595
FunnyCoder Avatar asked Sep 07 '18 12:09

FunnyCoder


People also ask

How do I merge two DataFrames in pandas based on multiple common column?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

How do I join multiple DataFrames based on one column?

To join these DataFrames, pandas provides multiple functions like concat() , merge() , join() , etc. In this section, you will practice using merge() function of pandas. You can notice that the DataFrames are now merged into a single DataFrame based on the common values present in the id column of both the DataFrames.


2 Answers

Use merge and reduce

In [86]: from functools import reduce

In [87]: reduce(lambda x,y: pd.merge(x,y, on='Col1', how='outer'), [df1, df2, df3])
Out[87]:
    Col1  Col2  Col3  Col4  Col5  Col6  Col7
0  data1     3     4   7.0   4.0   NaN   NaN
1  data2     4     3   6.0   9.0   5.0   8.0
2  data3     2     3   1.0   4.0   2.0   7.0
3  data4     2     4   NaN   NaN   NaN   NaN
4  data5     1     4   NaN   NaN   5.0   3.0

Details

In [88]: df1
Out[88]:
    Col1  Col2  Col3
0  data1     3     4
1  data2     4     3
2  data3     2     3
3  data4     2     4
4  data5     1     4

In [89]: df2
Out[89]:
    Col1  Col4  Col5
0  data1     7     4
1  data2     6     9
2  data3     1     4

In [90]: df3
Out[90]:
    Col1  Col6  Col7
0  data2     5     8
1  data3     2     7
2  data5     5     3
like image 177
Zero Avatar answered Sep 19 '22 11:09

Zero


Using pd.concat:

df1.set_index('Col1',inplace=True)
df2.set_index('Col1',inplace=True)
df3.set_index('Col1',inplace=True)
df = pd.concat([df1,df2,df3],axis=1,sort=False).reset_index()
df.rename(columns = {'index':'Col1'})

    Col1    Col2    Col3    Col4    Col5    Col6    Col7
0   data1   3       4       7.0     4.0     NaN     NaN
1   data2   4       3       6.0     9.0     5.0     8.0
2   data3   2       3       1.0     4.0     2.0     7.0
3   data4   2       4       NaN     NaN     NaN     NaN
4   data5   1       4       NaN     NaN     5.0     3.0
like image 39
Space Impact Avatar answered Sep 17 '22 11:09

Space Impact