Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unexpected behavior when combining two dataframes in pandas

Tags:

python

pandas

This may be a bug, but it may also be a subtlety of pandas that I'm missing. I'm combining two dataframes and the result's index isn't sorted. What's weird is that I've never seen a single instance of combine_first that failed to maintain the index sorted before.

>>> a1
                            X  Y
DateTime                                    
2012-11-06 16:00:11.477563      8        80
2012-11-06 16:00:11.477563      8        63
>>> a2
                        X  Y
DateTime                                   
2012-11-06 15:11:09.006507      1        37
2012-11-06 15:11:09.006507      1        36
>>> a1.combine_first(a2)
                            X  Y
DateTime                                   
2012-11-06 16:00:11.477563      8        80
2012-11-06 16:00:11.477563      8        63
2012-11-06 15:11:09.006507      1        37
2012-11-06 15:11:09.006507      1        36
>>> a2.combine_first(a1)
                            X  Y
DateTime                                    
2012-11-06 16:00:11.477563      8        80
2012-11-06 16:00:11.477563      8        63
2012-11-06 15:11:09.006507      1        37
2012-11-06 15:11:09.006507      1        36

I can reproduce, so I'm happy to take suggestions. Guesses as to what's going on are most welcome.

like image 929
Arthur B. Avatar asked Apr 26 '26 14:04

Arthur B.


1 Answers

The combine_first function uses index.union to combine and sort the indexes. The index.union docstring states that it only sorts if possible, so combine_first is not necessarily going to return sorted results by design.

For non-monotonic indexes, the index.union tries to sort, but returns unsorted results if there is an exception. I don't know if this is a bug or not, but index.union does not even attempt to sort monotonic indexes like the datetime index in your example.

I've opened an issue on GitHub, but I guess you should do a2.combine_first(a1).sort_index() for any datetime indexes for now.

Update: This bug is now fixed on GitHub

like image 181
Matti John Avatar answered Apr 29 '26 02:04

Matti John



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!