Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas merge on index column? [duplicate]

Tags:

python

pandas

In [88]: c
Out[88]: 
                       Address    Name
CustomerID                            
10            Address for Mike    Mike
11          Address for Marcia  Marcia

In [89]: c.index
Out[89]: Int64Index([10, 11], dtype='int64', name='CustomerID')

In [90]: orders
Out[90]: 
   CustomerID   OrderDate
0          10  2014-12-01
1          11  2014-12-01
2          10  2014-12-01

In [91]: orders.index
Out[91]: RangeIndex(start=0, stop=3, step=1)

In [92]: c.merge(orders)
---------------------------
MergeError: No common columns to perform merge on

So panda can't merge if index column in one dataframe has the same name as another column in a second dataframe?

like image 810
DmitrySemenov Avatar asked Dec 08 '22 17:12

DmitrySemenov


2 Answers

You need to explicitly specify how to join the table. By default, merge will choose common column name as merge key. For your case,

c.merge(orders, left_index=True, right_on='CustomID')

Also, read the docs of pandas.DataFrame.merge please. Hope this would be helpful.

like image 154
rojeeer Avatar answered Jan 04 '23 08:01

rojeeer


The join method does a left join by default (how='left') and joins on the indices of the dataframes. So set the index of the orders dataframe to CustomerId and then join.

# Create sample data.
orders = pd.DataFrame(
    {'CustomerID': [10, 11, 10],
     'OrderDate': ['2014-12-01', '2014-12-01', '2014-12-01']})    
c = pd.DataFrame(
    {'Address': ['Address for Mike', 'Address for Marcia'], 
     'Name': ['Mike', 'Marcia']},
    index=pd.Index([10, 11], dtype='int64', name='CustomerID'))

# Join.
>>> c.join(orders.set_index('CustomerID'))
                       Address    Name   OrderDate
CustomerID                                        
10            Address for Mike    Mike  2014-12-01
10            Address for Mike    Mike  2014-12-01
11          Address for Marcia  Marcia  2014-12-01

Alternatively, this merge will give you the same result. Here, you are joining on the index of c (the left dataframe) and on the CustomerID column in the right dataframe. Ensure to specify how='left' to only join items from the right dataframe to all of the records on the left (leaving an equivalent number of rows matching the length of c). The default behavior for merge is an inner join, wherebe the result only includes those records from c that find a match in orders (although this could be your desired result).

c.merge(orders, left_index=True, right_on='CustomerID', how='left')
like image 25
Alexander Avatar answered Jan 04 '23 09:01

Alexander