Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: merge (join) two data frames on multiple columns

I am trying to join two pandas data frames using two columns:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

but got the following error:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: '[B_1, c2]'

Any idea what should be the right way to do this? Thanks!

like image 460
Edamame Avatar asked Oct 04 '22 08:10

Edamame


People also ask

How do I merge two Dataframes with multiple columns in pandas?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.

Can we merge on two columns in pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame.

How do I merge two Dataframes based on a column?

Key Points Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.

Can you merge more than 2 Dataframes in pandas?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .

How do I join two DataFrames in pandas?

Must be found in both the left and right DataFrame objects. The data frames must have same column names on which the merging happens. Merge () Function in pandas is similar to database join operation in SQL. Inner Join or Natural join: To keep only rows that match from the data frames, specify the argument how= ‘inner’.

What is the difference between pandas merge and join?

Pandas .join (): Combining Data on a Column or Index. While merge () is a module function, .join () is an object function that lives on your DataFrame. This enables you to specify only one DataFrame, which will join the DataFrame you call .join () on.

How do I merge two DataFrames on multiple columns in Python?

Often you may want to merge two pandas DataFrames on multiple columns. Fortunately this is easy to do using the pandas merge()function, which uses the following syntax: pd.merge(df1, df2, left_on=['col1','col2'], right_on = ['col1','col2'])

How do you combine data in a Dataframe?

You have now learned the three most important techniques for combining data in Pandas: merge () for combining data on common columns or indices. .join () for combining data on a key column or an index. concat () for combining DataFrames across rows or columns.


2 Answers

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs

like image 536
Shijo Avatar answered Oct 09 '22 04:10

Shijo


the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

And this is the correct way of using the function:

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
like image 11
Celius Stingher Avatar answered Oct 09 '22 05:10

Celius Stingher