Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between pd.merge() and dataframe.merge()

I'm wondering what the difference is when you merge by pd.merge versus dataframe.merge(), examples below:

pd.merge(dataframe1, dataframe2)

and

dataframe1.merge(dataframe2)
like image 252
Emily Lo Avatar asked May 18 '26 19:05

Emily Lo


1 Answers

We've two functions at our disposal for almost the same task pandas.merge() and DataFrame.merge().

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, 
          left_index=False, right_index=False, 
          sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, 
             left_index=False, right_index=False, 
             sort=False, suffixes='_x', '_y', copy=True, indicator=False, validate=None)

Both look similar, what's the advantage of using one over the other?

pd.merge() calls for df.merge, so df1.merge(df2) will give almost same results as pd.merge(df1, df2).

However, pd.merge() is wrapping style function and df1.merge() is chaining style, which makes the later easier to chain from left to right

E.g.,

 df1.merge(df2).merge(df3) 
 #looks better and readable [analogus to %>% pipeline operator in R] than 
 pd.merge(pd.merge(df1, df2), df3).

Let's Look at a reproducible example

d1 = pd.read_html('https://worldpopulationreview.com/countries')
pop = d1[0]
print(pop.info(), '\n') #Data for 232 countries for 7 columns

pop.head(3)

d2 = pd.read_html('https://worldpopulationreview.com/country-rankings/median-age')
age = d2[0]
print(age.info(), '\n') #Data for 221 countries for 5 columns

age.head(3)

display('pd.merge(): ', pd.merge(pop, age), 'df.merge(): ', pop.merge(age))
like image 155
Dr Nisha Arora Avatar answered May 23 '26 11:05

Dr Nisha Arora