I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column. The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)
I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examples from the pandas website my code
sales.join(part_table, on='PART NUMBER')
But I got this error:
ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')
Can someone explain what this error means and how to solve it?
Many thanks!
We can join columns from two Dataframes using the merge() function. This is similar to the SQL 'join' functionality. A detailed discussion of different join types is given in the SQL lesson. You specify the type of join you want using the how parameter.
Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.
Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
Use DataFrame. loc[] and DataFrame. iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively.
I think you want to do a merge rather than a join:
sales.merge(part_table)
Here's an example dataframe:
In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])
In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')
In [14]: dfa.merge(dfb)
Out[14]:
A B C
0 1 2 a
1 3 4 b
2 3 4 c
It's unclear from the docs if this is intentational (I thought that on
would be used as the column) but following the exceptions message if you add suffixs we can see what's going on:
In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
A_a C A_b B
0 1 a 3 4
1 3 b NaN NaN
2 3 c NaN NaN
In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
A_a C A_b B
0 1 a 1 2
1 3 b 3 4
2 3 c NaN NaN
It's ignoring the on kwarg and just doing the join.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With