Sorting Multi-Index to full depth (Pandas)

Tags:

I have a dataframe which Im loading from a csv file and then setting the index to few of its columns (usually two or three) by the set_index method. The idea is to then access parts of the dataframe using several key combination, as such:

df.set_index(['fileName','phrase'])
df.ix['somePath','somePhrase']

Apparently, this type of selection with multiple keys is only possible if the MultiIndex of the dataframe is sorted to sufficient depth. In this case, since im supplying two keys, the .ix operation will not fail only if the dataframe MultiIndex is sorted to depth of at least 2.

for some reason, when Im setting the index as shown, while to me it seems both layers are sorted, calling df.index.lexsort_depth command returns 1 , and I get the following error when trying to access with two keys:

MultiIndex lexsort depth 1, key was length 2

Any help?

412

asked Nov 14 '13 15:11

idoda

1 Answers

Its not really clear what you are asking. Multi-index docs are here

The OP needs to set the index, then sort in place

df.set_index(['fileName','phrase'],inplace=True)
df.sortlevel(inplace=True)

Then access these levels via a tuple to get a specific result

df.ix[('somePath','somePhrase')]

Maybe just give a toy example like this and show I want to get a specific result.

In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'])
   ...:    .....: ,
   ...:    .....:           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
   ...:    .....:           ]

In [2]: df = DataFrame(randn(8, 4), index=arrays)

In [3]: df
Out[3]: 
                0         1         2         3
bar one  1.654436  0.184326 -2.337694  0.625120
    two  0.308995  1.219156 -0.906315  1.555925
baz one -0.180826 -1.951569  1.617950 -1.401658
    two  0.399151 -1.305852  1.530370 -0.132802
foo one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705
qux one -0.656487 -0.154881  0.495044 -1.380583
    two  0.274045 -0.070566  1.274355  1.172247

In [4]: df.index.lexsort_depth
Out[4]: 2

In [5]: df.ix[('foo','one')]
Out[5]: 
0    1.097562
1    0.097126
2    0.387418
3    0.106769
Name: (foo, one), dtype: float64

In [6]: df.ix['foo']
Out[6]: 
            0         1         2         3
one  1.097562  0.097126  0.387418  0.106769
two  0.465681  0.270120 -0.387639 -0.142705

In [7]: df.ix[['foo']]
Out[7]: 
                0         1         2         3
foo one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705

In [8]: df.sortlevel(level=1)
Out[8]: 
                0         1         2         3
bar one  1.654436  0.184326 -2.337694  0.625120
baz one -0.180826 -1.951569  1.617950 -1.401658
foo one  1.097562  0.097126  0.387418  0.106769
qux one -0.656487 -0.154881  0.495044 -1.380583
bar two  0.308995  1.219156 -0.906315  1.555925
baz two  0.399151 -1.305852  1.530370 -0.132802
foo two  0.465681  0.270120 -0.387639 -0.142705
qux two  0.274045 -0.070566  1.274355  1.172247

In [10]: df.sortlevel(level=1).index.lexsort_depth
Out[10]: 0

answered Sep 18 '22 13:09

Jeff

Related questions
                            
                                How to pass variables from python script to bash script
                            
                                Webpy: how to set http status code to 300
                            
                                How to have a nested inline formset within a form in Django?
                            
                                Parsing HTML with Python 2.7 - HTMLParser, SGMLParser, or Beautiful Soup?
                            
                                Is itertools thread-safe?
                            
                                PyInstaller, spec file, ImportError: No module named 'blah'
                            
                                SQL join or R's merge() function in NumPy?
                            
                                How often should custom exceptions be defined in python?
                            
                                Python: subprocess.Popen and subprocess.call hang
                            
                                Pythonic way to assign default values
                            
                                Python import web not working
                            
                                Lightweight Mongodb ODM/ORM for Python
                            
                                How to maintain case-sensitive tags in BeautifulSoup.BeautifulStoneSoup?
                            
                                Combining feature extraction classes in scikit-learn
                            
                                How can we make __future__ imports global?
                            
                                How to delete rows that satisfy some criteria in an excel spreadsheet?
                            
                                show images in Django templates
                            
                                Python Multiprocessing - apply class method to a list of objects
                            
                                python numpy and memory efficiency (pass by reference vs. value)
                            
                                How to obtain Flask request JSON data as dictionary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sorting Multi-Index to full depth (Pandas)

Tags:

python

pandas

idoda

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us