Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: simple 'join' not working?

Tags:

pandas

I like to think I'm not an idiot, but maybe I'm wrong. Can anyone explain to me why this isn't working? I can achieve the desired results using 'merge'. But I eventually need to join multiple pandas DataFrames so I need to get this method working.

In [2]: left = pandas.DataFrame({'ST_NAME': ['Oregon', 'Nebraska'], 'value': [4.685, 2.491]})

In [3]: right = pandas.DataFrame({'ST_NAME': ['Oregon', 'Nebraska'], 'value2': [6.218, 0.001]})

In [4]: left.join(right, on='ST_NAME', lsuffix='_left', rsuffix='_right')
Out[4]: 
  ST_NAME_left  value ST_NAME_right  value2
0       Oregon  4.685           NaN     NaN
1     Nebraska  2.491           NaN     NaN
like image 367
Phil Avatar asked Apr 11 '12 21:04

Phil


People also ask

How do you do a join in pandas DataFrame?

Join DataFrames using their indexes. If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index. Another option to join using the key columns is to use the on parameter.

How do I join 3 tables in pandas?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .


2 Answers

Try using merge:

In [14]: right
Out[14]: 
    ST_NAME  value2
0    Oregon   6.218
1  Nebraska   0.001

In [15]: merge(left, right)
Out[15]: 
    ST_NAME  value  value2
0  Nebraska  2.491   0.001
1    Oregon  4.685   6.218

In [18]: merge(left, right, on='ST_NAME', sort=False)
Out[18]: 
    ST_NAME  value  value2
0    Oregon  4.685   6.218
1  Nebraska  2.491   0.001

DataFrame.join is a bit of legacy method and apparently doesn't do column-on-column joins (originally it did index on column using the on parameter, hence the "legacy" designation).

like image 162
Wes McKinney Avatar answered Oct 06 '22 19:10

Wes McKinney


I can confirm, Pandas join method is faulty. In my case both keys were long strings (18 characters) and result was as if pandas was only matching first couple of characters. Merge function is working properly. Please do not use join function, it should be really removed from available methods, otherwise it can mess it up big time.

like image 2
Donatas Svilpa Avatar answered Oct 06 '22 21:10

Donatas Svilpa