Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the list of levels in a Pandas dataframe guaranteed to be sorted?

When creating a Pandas dataframe with a MultiIndex, the levels seem to always be sorted:

>>> pd.DataFrame([range(4)], columns=pd.MultiIndex.from_product([["b", "a"], [20, 10]]))
   b     a
  20 10 20 10
0  0  1  2  3

>>> _.columns
MultiIndex(levels=[[u'a', u'b'], [10, 20]],
           labels=[[1, 1, 0, 0], [1, 0, 1, 0]])

(Note how levels is sorted.) Is this guaranteed? Knowing this can help write robust code (since we can then rely on a simple property of MultiIndices).

I can't find any guarantee in the documentation (but then this doesn't mean that it couldn't be there!).

There are also old examples (from 2015) that show a different behavior, but maybe does Pandas now offer guarantees on the ordering of levels (in the same way as Python 3.6 offers a guarantee on the order of keys in dictionaries)?

like image 647
Eric O Lebigot Avatar asked Oct 23 '18 10:10

Eric O Lebigot


1 Answers

When creating a MultiIndex using from_product() or from_arrays() levels will be sorted because both methods use _factorize_from_iterables() which returns the indexes sorted.

>> list(_factorize_from_iterables([["b", "a"], [20, 10]]))

[[array([1, 0], dtype=int8), array([1, 0], dtype=int8)],
 [Index(['a', 'b'], dtype='object'), Int64Index([10, 20], dtype='int64')]]

MultiIndex.from_tuples() will also have sorted levels because it uses from_arrays() internally.

If you set MultiIndex without specifying a method however, levels won't be sorted.

>> midx = pd.MultiIndex(levels=[['b', 'a'], [20, 10]],
                                      labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
>> df = pd.DataFrame(np.random.randn(4,4), columns=midx)

>> df.columns

MultiIndex(levels=[['b', 'a'], [20, 10]],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

Above uses pandas version 0.22.0 (released in december 29, 2017) and is tested on version 0.23.4 (latest release).

like image 181
user3471881 Avatar answered Oct 13 '22 02:10

user3471881