I've set up a DataFrame with two indices. But slicing doesn't behave as expected. I realize that this is a very basic problem, so I searched for similar questions:
pandas: slice a MultiIndex by range of secondary index
Python Pandas slice multiindex by second level index (or any other level)
I also looked at the corresponding documentation
Strangely none of the proposed solutions work for me. I've set up a simple example to showcase the problem:
# this is my DataFrame
frame = pd.DataFrame([
{"a":1, "b":1, "c":"11"},
{"a":1, "b":2, "c":"12"},
{"a":2, "b":1, "c":"21"},
{"a":2, "b":2, "c":"22"},
{"a":3, "b":1, "c":"31"},
{"a":3, "b":2, "c":"32"}])
# now set a and b as multiindex
frame = frame.set_index(["a","b"])
Now I'm trying different ways of slicing the frame. The first two lines work, the third throws an exception:
# selecting a specific cell works
frame.loc[1,2]
# slicing along the second index works
frame.loc[1,:]
# slicing along the first doesn't work
frame.loc[:,1]
It's a TypeError:
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
Solution 1: Using tuples of slices
This is proposed in this question: pandas: slice a MultiIndex by range of secondary index
Indeed, you can pass a slice for each level
But that doesn't work for me, the same type error as above is produced.
frame.loc[(slice(1,2), 1)]
Solution 2: Using IndexSlice
Python Pandas slice multiindex by second level index (or any other level)
Use an indexer to slice arbitrary values in arbitrary dimensions
Again, that doesn't work for me, it produces the same type error.
frame.loc[pd.IndexSlice[:,2]]
I don't understand how this typeerror can be produced. After all I can use integers to select specific cells, and ranges along the second dimension work fine. Googling for my specific error message doesn't really help. For example, here someone tries to use integers to slice along an index of type float: https://github.com/pandas-dev/pandas/issues/12333
I tried explicitly converting my indices to int, maybe the numpy backend stores everything as float by default ? But that didn't change anything, afterwards the same errors as above appear:
frame["a"]=frame["a"].apply(lambda x : int(x))
frame["b"]=frame["b"].apply(lambda x : int(x))
type(frame["b"][0]) # it's numpy.int64
IIUC you just have to specify :
for columns when indexing a multi-index DF:
In [40]: frame.loc[pd.IndexSlice[:,2], :]
Out[40]:
c
a b
1 2 12
2 2 22
3 2 32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With