Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting Multi-Index to full depth (Pandas)

Tags:

python

pandas

I have a dataframe which Im loading from a csv file and then setting the index to few of its columns (usually two or three) by the set_index method. The idea is to then access parts of the dataframe using several key combination, as such:

df.set_index(['fileName','phrase'])
df.ix['somePath','somePhrase']

Apparently, this type of selection with multiple keys is only possible if the MultiIndex of the dataframe is sorted to sufficient depth. In this case, since im supplying two keys, the .ix operation will not fail only if the dataframe MultiIndex is sorted to depth of at least 2.

for some reason, when Im setting the index as shown, while to me it seems both layers are sorted, calling df.index.lexsort_depth command returns 1 , and I get the following error when trying to access with two keys:

MultiIndex lexsort depth 1, key was length 2

Any help?

like image 412
idoda Avatar asked Nov 14 '13 15:11

idoda


People also ask

How do I sort multiple index values in Pandas?

However, to sort MultiIndex at a specific level, use the multiIndex. sortlevel() method in Pandas. Set the level as an argument. To sort in descending order, use the ascending parameter and set to False.

How do I flatten a multi-level column in Pandas?

Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.

How do I change multiple index to columns in Pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.


1 Answers

Its not really clear what you are asking. Multi-index docs are here

The OP needs to set the index, then sort in place

df.set_index(['fileName','phrase'],inplace=True)
df.sortlevel(inplace=True)

Then access these levels via a tuple to get a specific result

df.ix[('somePath','somePhrase')]

Maybe just give a toy example like this and show I want to get a specific result.

In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'])
   ...:    .....: ,
   ...:    .....:           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
   ...:    .....:           ]

In [2]: df = DataFrame(randn(8, 4), index=arrays)

In [3]: df
Out[3]: 
                0         1         2         3
bar one  1.654436  0.184326 -2.337694  0.625120
    two  0.308995  1.219156 -0.906315  1.555925
baz one -0.180826 -1.951569  1.617950 -1.401658
    two  0.399151 -1.305852  1.530370 -0.132802
foo one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705
qux one -0.656487 -0.154881  0.495044 -1.380583
    two  0.274045 -0.070566  1.274355  1.172247

In [4]: df.index.lexsort_depth
Out[4]: 2

In [5]: df.ix[('foo','one')]
Out[5]: 
0    1.097562
1    0.097126
2    0.387418
3    0.106769
Name: (foo, one), dtype: float64

In [6]: df.ix['foo']
Out[6]: 
            0         1         2         3
one  1.097562  0.097126  0.387418  0.106769
two  0.465681  0.270120 -0.387639 -0.142705

In [7]: df.ix[['foo']]
Out[7]: 
                0         1         2         3
foo one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705

In [8]: df.sortlevel(level=1)
Out[8]: 
                0         1         2         3
bar one  1.654436  0.184326 -2.337694  0.625120
baz one -0.180826 -1.951569  1.617950 -1.401658
foo one  1.097562  0.097126  0.387418  0.106769
qux one -0.656487 -0.154881  0.495044 -1.380583
bar two  0.308995  1.219156 -0.906315  1.555925
baz two  0.399151 -1.305852  1.530370 -0.132802
foo two  0.465681  0.270120 -0.387639 -0.142705
qux two  0.274045 -0.070566  1.274355  1.172247

In [10]: df.sortlevel(level=1).index.lexsort_depth
Out[10]: 0
like image 57
Jeff Avatar answered Sep 18 '22 13:09

Jeff