Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: slicing along first level of multiindex

I've set up a DataFrame with two indices. But slicing doesn't behave as expected. I realize that this is a very basic problem, so I searched for similar questions:

pandas: slice a MultiIndex by range of secondary index

Python Pandas slice multiindex by second level index (or any other level)

I also looked at the corresponding documentation

Strangely none of the proposed solutions work for me. I've set up a simple example to showcase the problem:

# this is my DataFrame
frame = pd.DataFrame([
{"a":1, "b":1, "c":"11"},
{"a":1, "b":2, "c":"12"},
{"a":2, "b":1, "c":"21"},
{"a":2, "b":2, "c":"22"},
{"a":3, "b":1, "c":"31"},
{"a":3, "b":2, "c":"32"}])

# now set a and b as multiindex
frame = frame.set_index(["a","b"])

Now I'm trying different ways of slicing the frame. The first two lines work, the third throws an exception:

# selecting a specific cell works
frame.loc[1,2]

# slicing along the second index works
frame.loc[1,:]

# slicing along the first doesn't work
frame.loc[:,1]

It's a TypeError:

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

Solution 1: Using tuples of slices

This is proposed in this question: pandas: slice a MultiIndex by range of secondary index

Indeed, you can pass a slice for each level

But that doesn't work for me, the same type error as above is produced.

frame.loc[(slice(1,2), 1)]

Solution 2: Using IndexSlice

Python Pandas slice multiindex by second level index (or any other level)

Use an indexer to slice arbitrary values in arbitrary dimensions

Again, that doesn't work for me, it produces the same type error.

frame.loc[pd.IndexSlice[:,2]]

I don't understand how this typeerror can be produced. After all I can use integers to select specific cells, and ranges along the second dimension work fine. Googling for my specific error message doesn't really help. For example, here someone tries to use integers to slice along an index of type float: https://github.com/pandas-dev/pandas/issues/12333

I tried explicitly converting my indices to int, maybe the numpy backend stores everything as float by default ? But that didn't change anything, afterwards the same errors as above appear:

frame["a"]=frame["a"].apply(lambda x : int(x))
frame["b"]=frame["b"].apply(lambda x : int(x))

type(frame["b"][0])  # it's numpy.int64
like image 947
lhk Avatar asked Feb 04 '23 04:02

lhk


1 Answers

IIUC you just have to specify : for columns when indexing a multi-index DF:

In [40]: frame.loc[pd.IndexSlice[:,2], :]
Out[40]:
      c
a b
1 2  12
2 2  22
3 2  32
like image 170
MaxU - stop WAR against UA Avatar answered Feb 06 '23 19:02

MaxU - stop WAR against UA