I have a dataframe which Im loading from a csv file and then setting the index to few of its columns (usually two or three) by the set_index
method. The idea is to then access parts of the dataframe using several key combination, as such:
df.set_index(['fileName','phrase'])
df.ix['somePath','somePhrase']
Apparently, this type of selection with multiple keys is only possible if the MultiIndex
of the dataframe is sorted to sufficient depth. In this case, since im supplying two keys, the .ix
operation will not fail only if the dataframe MultiIndex
is sorted to depth of at least 2.
for some reason, when Im setting the index as shown, while to me it seems both layers are sorted, calling df.index.lexsort_depth
command returns 1
, and I get the following error when trying to access with two keys:
MultiIndex lexsort depth 1, key was length 2
Any help?
However, to sort MultiIndex at a specific level, use the multiIndex. sortlevel() method in Pandas. Set the level as an argument. To sort in descending order, use the ascending parameter and set to False.
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
Its not really clear what you are asking. Multi-index docs are here
The OP needs to set the index, then sort in place
df.set_index(['fileName','phrase'],inplace=True)
df.sortlevel(inplace=True)
Then access these levels via a tuple to get a specific result
df.ix[('somePath','somePhrase')]
Maybe just give a toy example like this and show I want to get a specific result.
In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'])
...: .....: ,
...: .....: np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
...: .....: ]
In [2]: df = DataFrame(randn(8, 4), index=arrays)
In [3]: df
Out[3]:
0 1 2 3
bar one 1.654436 0.184326 -2.337694 0.625120
two 0.308995 1.219156 -0.906315 1.555925
baz one -0.180826 -1.951569 1.617950 -1.401658
two 0.399151 -1.305852 1.530370 -0.132802
foo one 1.097562 0.097126 0.387418 0.106769
two 0.465681 0.270120 -0.387639 -0.142705
qux one -0.656487 -0.154881 0.495044 -1.380583
two 0.274045 -0.070566 1.274355 1.172247
In [4]: df.index.lexsort_depth
Out[4]: 2
In [5]: df.ix[('foo','one')]
Out[5]:
0 1.097562
1 0.097126
2 0.387418
3 0.106769
Name: (foo, one), dtype: float64
In [6]: df.ix['foo']
Out[6]:
0 1 2 3
one 1.097562 0.097126 0.387418 0.106769
two 0.465681 0.270120 -0.387639 -0.142705
In [7]: df.ix[['foo']]
Out[7]:
0 1 2 3
foo one 1.097562 0.097126 0.387418 0.106769
two 0.465681 0.270120 -0.387639 -0.142705
In [8]: df.sortlevel(level=1)
Out[8]:
0 1 2 3
bar one 1.654436 0.184326 -2.337694 0.625120
baz one -0.180826 -1.951569 1.617950 -1.401658
foo one 1.097562 0.097126 0.387418 0.106769
qux one -0.656487 -0.154881 0.495044 -1.380583
bar two 0.308995 1.219156 -0.906315 1.555925
baz two 0.399151 -1.305852 1.530370 -0.132802
foo two 0.465681 0.270120 -0.387639 -0.142705
qux two 0.274045 -0.070566 1.274355 1.172247
In [10]: df.sortlevel(level=1).index.lexsort_depth
Out[10]: 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With