Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two DataFrames on one key column / ERROR: 'columns overlap but no suffix specified'

I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column. The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)

enter image description here

enter image description here

I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examples from the pandas website my code

sales.join(part_table, on='PART NUMBER')

But I got this error:

ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')

Can someone explain what this error means and how to solve it?

Many thanks!

like image 549
Yumi Avatar asked Sep 24 '14 22:09

Yumi


People also ask

How do I join two DataFrames in a column?

We can join columns from two Dataframes using the merge() function. This is similar to the SQL 'join' functionality. A detailed discussion of different join types is given in the SQL lesson. You specify the type of join you want using the how parameter.

What is the difference between merge and join in pandas?

Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.

How do I merge two pandas DataFrames?

Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.

How do I select a specific column in pandas?

Use DataFrame. loc[] and DataFrame. iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively.


1 Answers

I think you want to do a merge rather than a join:

sales.merge(part_table)

Here's an example dataframe:

In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])

In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')

In [14]: dfa.merge(dfb)
Out[14]:
   A  B  C
0  1  2  a
1  3  4  b
2  3  4  c

It's unclear from the docs if this is intentational (I thought that on would be used as the column) but following the exceptions message if you add suffixs we can see what's going on:

In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
   A_a  C  A_b   B
0    1  a    3   4
1    3  b  NaN NaN
2    3  c  NaN NaN

In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
   A_a  C  A_b   B
0    1  a    1   2
1    3  b    3   4
2    3  c  NaN NaN

It's ignoring the on kwarg and just doing the join.

like image 131
Andy Hayden Avatar answered Nov 15 '22 00:11

Andy Hayden