I like to think I'm not an idiot, but maybe I'm wrong. Can anyone explain to me why this isn't working? I can achieve the desired results using 'merge'. But I eventually need to join multiple pandas
DataFrames
so I need to get this method working.
In [2]: left = pandas.DataFrame({'ST_NAME': ['Oregon', 'Nebraska'], 'value': [4.685, 2.491]})
In [3]: right = pandas.DataFrame({'ST_NAME': ['Oregon', 'Nebraska'], 'value2': [6.218, 0.001]})
In [4]: left.join(right, on='ST_NAME', lsuffix='_left', rsuffix='_right')
Out[4]:
ST_NAME_left value ST_NAME_right value2
0 Oregon 4.685 NaN NaN
1 Nebraska 2.491 NaN NaN
Join DataFrames using their indexes. If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index. Another option to join using the key columns is to use the on parameter.
We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .
Try using merge
:
In [14]: right
Out[14]:
ST_NAME value2
0 Oregon 6.218
1 Nebraska 0.001
In [15]: merge(left, right)
Out[15]:
ST_NAME value value2
0 Nebraska 2.491 0.001
1 Oregon 4.685 6.218
In [18]: merge(left, right, on='ST_NAME', sort=False)
Out[18]:
ST_NAME value value2
0 Oregon 4.685 6.218
1 Nebraska 2.491 0.001
DataFrame.join
is a bit of legacy method and apparently doesn't do column-on-column joins (originally it did index on column using the on parameter, hence the "legacy" designation).
I can confirm, Pandas join method is faulty. In my case both keys were long strings (18 characters) and result was as if pandas was only matching first couple of characters. Merge function is working properly. Please do not use join function, it should be really removed from available methods, otherwise it can mess it up big time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With