Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MultiIndex-based indexing in pandas [duplicate]

Tags:

If I define a hierarchically-indexed dataframe like this:

import itertools import pandas as pd import numpy as np a = ('A', 'B') i = (0, 1, 2) b = (True, False) idx = pd.MultiIndex.from_tuples(list(itertools.product(a, i, b)),                                 names=('Alpha', 'Int', 'Bool')) df = pd.DataFrame(np.random.randn(len(idx), 7), index=idx,                   columns=('I', 'II', 'III', 'IV', 'V', 'VI', 'VII')) 

the contents look like this:

In [19]: df Out[19]:                          I        II       III        IV         V        VI       VII Alpha Int Bool                                                                        A     0   True  -0.462924  1.210442  0.306737  0.325116 -1.320084 -0.831699  0.892865           False -0.850570 -0.949779  0.022074 -0.205575 -0.684794 -0.214307 -1.133833       1   True   0.603602  1.387020 -0.830780 -1.242000 -0.321938  0.484271  0.171738           False -1.591730  1.282136  0.095159 -1.239882  0.760880 -0.606444 -0.485957       2   True  -1.346883  1.650247 -1.476443  2.092067  1.344689  0.177083  0.100844           False  0.001407 -1.127299 -0.417828  0.143595 -0.277838 -0.478262 -0.350906 B     0   True   0.722781 -1.093182  0.237536  0.457614 -2.500885  0.338257  0.009128           False  0.321022  0.419357  1.161140 -1.371035  1.093696  0.250517 -1.125612       1   True   0.237441  1.739933  0.029653  0.327823 -0.384647  1.523628 -0.009053           False -0.459148 -0.598577 -0.593486 -0.607447  1.478399  0.504028 -0.329555       2   True  -0.583052 -0.986493 -0.057788 -0.639798  1.400311  0.076471 -0.212513           False  0.896755  2.583520  1.520151  2.367336 -1.084994 -1.233548 -2.414215 

I know how to extract the data corresponding to a given column. E.g. for column 'VII':

In [20]: df['VII'] Out[20]:  Alpha  Int  Bool  A      0    True     0.892865             False   -1.133833        1    True     0.171738             False   -0.485957        2    True     0.100844             False   -0.350906 B      0    True     0.009128             False   -1.125612        1    True    -0.009053             False   -0.329555        2    True    -0.212513             False   -2.414215 Name: VII 

How do I extract the data matching the following sets of criteria:

  1. Alpha=='B'
  2. Alpha=='B', Bool==False
  3. Alpha=='B', Bool==False, column 'I'
  4. Alpha=='B', Bool==False, columns 'I' and 'III'
  5. Alpha=='B', Bool==False, columns 'I', 'III', and all columns from 'V' onwards
  6. Int is even

(BTW, I did rtfm, more than once even, but I really find it incomprehensible.)

like image 840
kjo Avatar asked Feb 19 '13 18:02

kjo


People also ask

Can a pandas index have duplicates?

duplicated() function Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.

How do I get rid of duplicate index in pandas?

Sometimes you may have duplicates in pandas index and you can drop these using index. drop_duplicates() (dropduplicates).

Can index have duplicates?

Duplicate indexes are those that exactly match the Key and Included columns. That's easy. Possible duplicate indexes are those that very closely match Key/Included columns.


2 Answers

xs may be what you want. Here are a few examples:

In [63]: df.xs(('B',), level='Alpha') Out[63]:                   I        II       III        IV         V        VI       VII Int Bool                                                                        0   True  -0.430563  0.139969 -0.356883 -0.574463 -0.107693 -1.030063  0.271250     False  0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951 1   True  -0.450375  1.237018  0.398290  0.246182 -0.237919  1.372239 -0.805403     False -0.064493  0.967132 -0.674451  0.666691 -0.350378  1.721682 -0.791897 2   True   0.143154 -0.061543 -1.157361  0.864847 -0.379616 -0.762626  0.645582     False -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494  In [64]: df.xs(('B', False), level=('Alpha', 'Bool')) Out[64]:             I        II       III        IV         V        VI       VII Int                                                                       0    0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951 1   -0.064493  0.967132 -0.674451  0.666691 -0.350378  1.721682 -0.791897 2   -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494  

Edit:

For the last requirement you can chain get_level_values and isin:

Get the even values in the index (other ways to do this too)

In [87]: ix_vals = set(i for _, i, _ in df.index if i % 2 == 0)          ix_vals  Out[87]: set([0L, 2L]) 

Use these with isin

In [89]: ix = df.index.get_level_values('Int').isin(ix_vals) In [90]: df[ix] Out[90]:                I        II       III        IV         V        VI       VII Alpha Int Bool                                                                        A     0   True  -1.315409  1.203800  0.330372 -0.295718 -0.679039  1.402114  0.778572           False  0.008189 -0.104372  0.419110  0.302978 -0.880262 -1.037645 -0.264265       2   True  -2.414290  0.896990  0.986167 -0.527074  0.550753 -0.302920  0.228165           False  1.275831  0.448089 -0.635874 -0.733855 -0.747774 -1.108976  0.151474 B     0   True  -0.430563  0.139969 -0.356883 -0.574463 -0.107693 -1.030063  0.271250           False  0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951       2   True   0.143154 -0.061543 -1.157361  0.864847 -0.379616 -0.762626  0.645582           False -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494  
like image 178
beardc Avatar answered Oct 23 '22 21:10

beardc


You can use pd.IndexSlice for an intuitive way (Inspired from this answer). Some examples (using pandas 0.18.0):

df.sort_index(inplace=True) idx = pd.IndexSlice evens = np.arange(2,max(df.index.levels[1])+1,2)  df.loc[idx[['A','B'],evens,True],['III','V']] Out[]:                       III         V Alpha Int Bool                     A     2   True -1.041243 -0.561155 B     2   True  0.381918 -0.148990  df.loc[idx[:,evens,:],:] Out[]:                          I        II       III        IV         V        VI  \ Alpha Int Bool                                                                 A     2   False  0.791142  0.333383  0.089767 -0.584465  0.295676 -1.323792              True  -1.023160 -0.442004 -1.041243  1.613184 -0.561155  0.397923    B     2   False  0.383229 -0.052715 -0.214347 -2.041429 -1.101059 -0.374035              True  -0.183386 -0.855367  0.381918 -0.261106 -0.148990  0.621537                           VII   Alpha Int Bool              A     2   False  0.717301             True  -0.133701   B     2   False  0.166314             True   0.517513 
like image 33
TNT Avatar answered Oct 23 '22 22:10

TNT