Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas IndexSlice fails with pd.style

Tags:

python

pandas

Given this dataframe:

In [1]: df = pd.DataFrame(np.random.rand(4,4),
                          index=['A','B','C','All'],
                          columns=[2011,2012,2013,'All']).round(2)
        print(df)
Out[1]:

     2011  2012  2013   All
A    0.94  0.17  0.06  0.64
B    0.49  0.16  0.43  0.64
C    0.16  0.20  0.22  0.37
All  0.94  0.04  0.72  0.18

I'm trying to using pd.style to format the output of a dataframe. One keyword is subset where you define where to apply your formatting rules (for example: highlight max). The documentation for pd.style hints that it's better to use pd.IndexSlice for this:

The value passed to subset behaves simlar to slicing a DataFrame.

  • A scalar is treated as a column label
  • A list (or series or numpy array)
  • A tuple is treated as (row_indexer, column_indexer)

Consider using pd.IndexSlice to construct the tuple for the last one.

I'm trying to understand why it's failing in some cases.

Let's say I want to to apply a bar to all rows but the first and last, and all columns but the last.

This IndexSlice works:

In [2]: df.ix[pd.IndexSlice[1:-1,:-1]]
Out[2]:
   2011  2012  2013
B  0.49  0.16  0.43
C  0.16  0.20  0.22

But when passed to style.bar, it doesn't:

In [3]: df.style.bar(subset=pd.IndexSlice[1:-1,:-1], color='#d65f5f')

TypeError: cannot do slice indexing on <class 'pandas.indexes.base.Index'>
with these indexers [1] of <class 'int'>

Whereas if I pass it slightly differently, it works:

In [4]: df.style.bar(subset=pd.IndexSlice[df.index[1:-1],df.columns[:-1]],
                     color='#d65f5f')

df.style.bar works as expected

I'm confused why this doesn't work. There seems to be a bit of lack of documentation regarding pd.IndexSlice so maybe I'm missing something. It could also be a bug in pd.style (which is fairly new, since 0.17.1 only).

Can someone explain what is wrong?

like image 387
Julien Marrec Avatar asked Dec 08 '16 23:12

Julien Marrec


1 Answers

It's too bad this compatibility issue exists. From what I can tell though, you answer your own question. From you doc's you included the line:

A tuple is treated as (row_indexer, column_indexer)

This is not what we get with the first slice:

In [1]: pd.IndexSlice[1:-1,:-1]
Out[2]: (slice(1, -1, None), slice(None, -1, None))

but we do get something of that form from the second slice method:

In [3]: pd.IndexSlice[df.index[1:-1],df.columns[:-1]]
Out[4]: (Index(['B', 'C'], dtype='object'), Index([2011, 2012, 2013], dtype='object'))

I don't think that pd.IndexSlice even does anything except wrap the contents in a tuple for this second case. You can just do this:

df.style.bar(subset=(df.index[1:-1],df.columns[:-1]),
                     color='#d65f5f')
like image 121
AlexG Avatar answered Oct 05 '22 11:10

AlexG