I have two DataFrames in pandas, trying to merge them. But pandas keeps changing the order. I've tried setting indexes, resetting them, no matter what I do, I can't get the returned output to have the rows in the same order. Is there a trick? Note we start out with the loans order 'a,b,c' but after the merge, it's "a,c,b".
import pandas
loans = [ 'a', 'b', 'c' ]
states = [ 'OR', 'CA', 'OR' ]
x = pandas.DataFrame({ 'loan' : loans, 'state' : states })
y = pandas.DataFrame({ 'state' : [ 'CA', 'OR' ], 'value' : [ 1, 2]})
z = x.merge(y, how='left', on='state')
But now the order is no longer the original 'a,b,c'. Any ideas? I'm using pandas version 11.
Pandas. DataFrame doesn't preserve the column order when converting from a DataFrames.
INNER MergePandas uses “inner” merge by default. This keeps only the common values in both the left and right dataframes for the merged data. In our case, only the rows that contain use_id values that are common between user_usage and user_device remain in the merged data — inner_merge.
Merge can be used in cases where both the left and right columns are not unique, and therefore cannot be an index. A merge is also just as efficient as a join as long as: Merging is done on indexes if possible.
By default Pandas merge method is case-sensitive. There should be a way to merge 2 dataframes without considering upper/ lower case just like SQL joins.
Hopefully someone will provide a better answer, but in case no one does, this will definitely work, so…
Zeroth, I'm assuming you don't want to just end up sorted on loan
, but to preserve whatever original order was in x
, which may or may not have anything to do with the order of the loan
column. (Otherwise, the problem is easier, and less interesting.)
First, you're asking it to sort based on the join keys. As the docs explain, that's the default when you don't pass a sort
argument.
Second, if you don't sort based on the join keys, the rows will end up grouped together, such that two rows that merged from the same source row end up next to each other, which means you're still going to get a
, c
, b
.
You can work around this by getting the rows grouped together in the order they appear in the original x
by just merging again with x
(on either side, it doesn't really matter), or by reindexing based on x
if you prefer. Like this:
x.merge(x.merge(y, how='left', on='state', sort=False))
Alternatively, you can cram an x-index in there with reset_index
, then just sort on that, like this:
x.reset_index().merge(y, how='left', on='state', sort=False).sort('index')
Either way obviously seems a bit wasteful, and clumsy… so, as I said, hopefully there's a better answer that I'm just not seeing at the moment. But if not, that works.
I might have a much more simple solution:
df_z = df_x.join(df_y.set_index('state'), on = 'state')
Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With