Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using slicers on a multi-index

I've got a dataframe of the form:

Contract  Date      
201501    2014-04-29    1416.0
          2014-04-30    1431.1
          2014-05-01    1430.6
          2014-05-02    1443.9
          2014-05-05    1451.6
          2014-05-06    1461.4
          2014-05-07    1456.0
          2014-05-08    1441.1
          2014-05-09    1437.8
          2014-05-12    1445.2
          2014-05-13    1458.2
          2014-05-14    1487.6
          2014-05-15    1477.6
          2014-05-16    1467.9
          2014-05-19    1484.9
          2014-05-20    1470.5
          2014-05-21    1476.9
          2014-05-22    1490.0
          2014-05-23    1473.3
          2014-05-27    1462.5
          2014-05-28    1456.3
          2014-05-29    1460.5
201507    2014-05-30    1463.5
          2014-06-02    1447.5
          2014-06-03    1444.4
          2014-06-04    1444.7
          2014-06-05    1455.9
          2014-06-06    1464.0

Where Contract & Date are indices of type int and datetime64 respectively.

What I want is to select a date range. It works by doing:

df.reset_index('Contract', drop=True).loc['2014-09']

But I hate this as it loses the index/is not very pleasant (I have to do a lot of these).

I think I should be able to do it like this:

df.loc[:,'2014-09']

to bring back all of September 2014's data. In reality, this doesn't work. I can only select a single day by doing:

df.loc[:,'2014-09-02']

Why doesn't my multi-index slicer work?

like image 590
cjm2671 Avatar asked Jul 10 '16 16:07

cjm2671


1 Answers

Pandas needs you to be explicit about whether you're selecting columns or sub-levels of a hierarchical index. In this case, df.loc[:,'2014-09'] fails because pandas tries to get all rows and then look for a column labelled '2014-09' (which doesn't exist).

Instead, you need to give both levels of the multi-index and the column labels/slice.

To select all of the May 2014 data from your example you could write:

>>> df.loc[(slice(None), '2014-05'), :]                            
Contract Date              
201501   2014-05-01  1430.6
         2014-05-02  1443.9
         2014-05-05  1451.6
         2014-05-06  1461.4
         2014-05-07  1456.0
         2014-05-08  1441.1
         2014-05-09  1437.8
         2014-05-12  1445.2
         2014-05-13  1458.2
         2014-05-14  1487.6
         2014-05-15  1477.6
         2014-05-16  1467.9
         2014-05-19  1484.9
         2014-05-20  1470.5
         2014-05-21  1476.9
         2014-05-22  1490.0
         2014-05-23  1473.3
         2014-05-27  1462.5
         2014-05-28  1456.3
         2014-05-29  1460.5
201507   2014-05-30  1463.5

Here [(slice(None), '2014-05'), :] translates to a slice of [:, '2014-05'] for the rows and [:] for the columns.

The pd.IndexSlice object was introduced to make these slice semantics a little easier:

>>> idx = pd.IndexSlice
>>> df.loc[idx[:, '2014-05'], :]
# same slice of DataFrame
like image 130
Alex Riley Avatar answered Oct 16 '22 07:10

Alex Riley