I have a dataframe df:
id name count 1 a 10 2 b 20 3 c 30 4 d 40 5 e 50
Here I have another dataframe df2:
id1 price rating 1 100 1.0 2 200 2.0 3 300 3.0 5 500 5.0
I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:
id name count price rating 1 a 10 100 1.0 2 b 20 200 2.0 3 c 30 300 3.0 4 d 40 Nan Nan 5 e 50 500 5.0
Should I use df.merge or pd.concat?
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
You can use pd. concat([df1, df2, df3, df4], axis=1) to concat vertically.
Use merge
:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1)) id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0
Another solution is simple rename column:
print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id', how='left')) id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0
If need only column price
the simpliest is map
:
df1['price'] = df1.id.map(df2.set_index('id1')['price']) print (df1) id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0
Another 2 solutions:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left') .drop(['id1', 'rating'], axis=1)) id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0
print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left') .drop('id1', axis=1)) id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0
join
utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left'
dataframe.
Strategy:
set_index
on df2
to be id1
join
with df
as the left dataframe and id
as the on
parameter. Note that I could have set_index('id')
on df
to avoid having to use the on
parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.df.join(df2.set_index('id1'), on='id') id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0
If you only want price
from df2
df.join(df2.set_index('id1')[['price']], on='id') id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With