Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JOIN two dataframes on common column in python

I have a dataframe df:

id   name   count 1    a       10 2    b       20 3    c       30 4    d       40 5    e       50 

Here I have another dataframe df2:

id1  price   rating  1     100     1.0  2     200     2.0  3     300     3.0  5     500     5.0 

I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:

id   name   count   price   rating 1    a       10      100      1.0 2    b       20      200      2.0 3    c       30      300      3.0 4    d       40      Nan      Nan 5    e       50      500      5.0 

Should I use df.merge or pd.concat?

like image 673
Shubham R Avatar asked Jan 04 '17 11:01

Shubham R


People also ask

How do I merge two DataFrames with a common column in Python?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

How do you join two pandas DataFrames using the common column of both DataFrames which function can be used?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

How do I join two DataFrames vertically in Python?

You can use pd. concat([df1, df2, df3, df4], axis=1) to concat vertically.


2 Answers

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))    id name  count  price  rating 0   1    a     10  100.0     1.0 1   2    b     20  200.0     2.0 2   3    c     30  300.0     3.0 3   4    d     40    NaN     NaN 4   5    e     50  500.0     5.0 

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))    id name  count  price  rating 0   1    a     10  100.0     1.0 1   2    b     20  200.0     2.0 2   3    c     30  300.0     3.0 3   4    d     40    NaN     NaN 4   5    e     50  500.0     5.0 

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price']) print (df1)    id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0 

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')          .drop(['id1', 'rating'], axis=1))    id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0 

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')          .drop('id1', axis=1))    id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0 
like image 81
jezrael Avatar answered Sep 22 '22 21:09

jezrael


join utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left' dataframe.

Strategy:

  • set_index on df2 to be id1
  • use join with df as the left dataframe and id as the on parameter. Note that I could have set_index('id') on df to avoid having to use the on parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.

df.join(df2.set_index('id1'), on='id')     id name  count  price  rating 0   1    a     10  100.0     1.0 1   2    b     20  200.0     2.0 2   3    c     30  300.0     3.0 3   4    d     40    NaN     NaN 4   5    e     50  500.0     5.0 

If you only want price from df2

df.join(df2.set_index('id1')[['price']], on='id')      id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0 
like image 44
piRSquared Avatar answered Sep 18 '22 21:09

piRSquared