I am trying to join two pandas data frames using two columns:
new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')
but got the following error:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()
KeyError: '[B_1, c2]'
Any idea what should be the right way to do this? Thanks!
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame.
Key Points Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .
Must be found in both the left and right DataFrame objects. The data frames must have same column names on which the merging happens. Merge () Function in pandas is similar to database join operation in SQL. Inner Join or Natural join: To keep only rows that match from the data frames, specify the argument how= ‘inner’.
Pandas .join (): Combining Data on a Column or Index. While merge () is a module function, .join () is an object function that lives on your DataFrame. This enables you to specify only one DataFrame, which will join the DataFrame you call .join () on.
Often you may want to merge two pandas DataFrames on multiple columns. Fortunately this is easy to do using the pandas merge()function, which uses the following syntax: pd.merge(df1, df2, left_on=['col1','col2'], right_on = ['col1','col2'])
You have now learned the three most important techniques for combining data in Pandas: merge () for combining data on common columns or indices. .join () for combining data on a key column or an index. concat () for combining DataFrames across rows or columns.
Try this
new_df = pd.merge(A_df, B_df, how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs
the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:
new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')
And this is the correct way of using the function:
new_df = pd.merge(A_df, B_df, how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With