This may be a bug, but it may also be a subtlety of pandas that I'm missing. I'm combining two dataframes and the result's index isn't sorted. What's weird is that I've never seen a single instance of combine_first that failed to maintain the index sorted before.
>>> a1
X Y
DateTime
2012-11-06 16:00:11.477563 8 80
2012-11-06 16:00:11.477563 8 63
>>> a2
X Y
DateTime
2012-11-06 15:11:09.006507 1 37
2012-11-06 15:11:09.006507 1 36
>>> a1.combine_first(a2)
X Y
DateTime
2012-11-06 16:00:11.477563 8 80
2012-11-06 16:00:11.477563 8 63
2012-11-06 15:11:09.006507 1 37
2012-11-06 15:11:09.006507 1 36
>>> a2.combine_first(a1)
X Y
DateTime
2012-11-06 16:00:11.477563 8 80
2012-11-06 16:00:11.477563 8 63
2012-11-06 15:11:09.006507 1 37
2012-11-06 15:11:09.006507 1 36
I can reproduce, so I'm happy to take suggestions. Guesses as to what's going on are most welcome.
The combine_first function uses index.union to combine and sort the indexes. The index.union docstring states that it only sorts if possible, so combine_first is not necessarily going to return sorted results by design.
For non-monotonic indexes, the index.union tries to sort, but returns unsorted results if there is an exception. I don't know if this is a bug or not, but index.union does not even attempt to sort monotonic indexes like the datetime index in your example.
I've opened an issue on GitHub, but I guess you should do a2.combine_first(a1).sort_index() for any datetime indexes for now.
Update: This bug is now fixed on GitHub
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With