Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two DataFrames based on multiple keys in pandas

Does pandas (or another module) have any functions to support merge (or join) two tables based on multiple keys?

For example, I have two tables (DataFrames) a and b:

>>> a A  B  value1 1  1      23 1  2      34 2  1    2342 2  2     333  >>> b A  B  value2 1  1    0.10 1  2    0.20 2  1    0.13 2  2    0.33 

The desired result is:

A  B  value1  value2 1  1      23    0.10 1  2      34    0.20 2  1    2342    0.13 2  2     333    0.33 
like image 220
Surah Li Avatar asked Aug 28 '15 18:08

Surah Li


People also ask

How do I merge two Dataframes based on multiple columns?

You can pass two DataFrame to be merged to the pandas. merge() method. This collects all common columns in both DataFrames and replaces each common column in both DataFrame with a single one. It merges the DataFrames df and df1 assigns to merged_df .

How do I join two keys in pandas?

Contribute your code (and comments) through Disqus. Previous: Write a Pandas program to join (left join) two dataframes using keys from right dataframe only. Next: Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.

How do I merge two Dataframes with different columns in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


2 Answers

To merge by multiple keys, you just need to pass the keys in a list to pd.merge:

>>> pd.merge(a, b, on=['A', 'B'])    A  B  value1  value2 0  1  1      23    0.10 1  1  2      34    0.20 2  2  1    2342    0.13 3  2  2     333    0.33 

In fact, the default for pd.merge is to use the intersection of the two DataFrames' column labels, so pd.merge(a, b) would work equally well in this case.

like image 69
Alex Riley Avatar answered Oct 14 '22 10:10

Alex Riley


According to the most recent pandas documentation, the on parameter accepts either a label or list on the field name and must be found in both data frames. Here is an MWE for its use:

a = pd.DataFrame({'A':['0', '0', '1','1'],'B':['0', '1', '0','1'], 'v':True, False, False, True]})  b = pd.DataFrame({'A':['0', '0', '1','1'], 'B':['0', '1', '0','1'],'v':[False, True, True, True]})  result = pd.merge(a, b, on=['A','B'], how='inner', suffixes=['_and', '_or']) >>> result     A   B   v_and   v_or  0   0   0   True    False 1   0   1   False   True 2   1   0   False   True 3   1   1   True    True 

on : label or list Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

Check out latest pd.merge documentation for further details.

like image 36
Miguel Rueda Avatar answered Oct 14 '22 11:10

Miguel Rueda