If I define a hierarchically-indexed dataframe like this:
import itertools import pandas as pd import numpy as np a = ('A', 'B') i = (0, 1, 2) b = (True, False) idx = pd.MultiIndex.from_tuples(list(itertools.product(a, i, b)),                                 names=('Alpha', 'Int', 'Bool')) df = pd.DataFrame(np.random.randn(len(idx), 7), index=idx,                   columns=('I', 'II', 'III', 'IV', 'V', 'VI', 'VII'))   the contents look like this:
In [19]: df Out[19]:                          I        II       III        IV         V        VI       VII Alpha Int Bool                                                                        A     0   True  -0.462924  1.210442  0.306737  0.325116 -1.320084 -0.831699  0.892865           False -0.850570 -0.949779  0.022074 -0.205575 -0.684794 -0.214307 -1.133833       1   True   0.603602  1.387020 -0.830780 -1.242000 -0.321938  0.484271  0.171738           False -1.591730  1.282136  0.095159 -1.239882  0.760880 -0.606444 -0.485957       2   True  -1.346883  1.650247 -1.476443  2.092067  1.344689  0.177083  0.100844           False  0.001407 -1.127299 -0.417828  0.143595 -0.277838 -0.478262 -0.350906 B     0   True   0.722781 -1.093182  0.237536  0.457614 -2.500885  0.338257  0.009128           False  0.321022  0.419357  1.161140 -1.371035  1.093696  0.250517 -1.125612       1   True   0.237441  1.739933  0.029653  0.327823 -0.384647  1.523628 -0.009053           False -0.459148 -0.598577 -0.593486 -0.607447  1.478399  0.504028 -0.329555       2   True  -0.583052 -0.986493 -0.057788 -0.639798  1.400311  0.076471 -0.212513           False  0.896755  2.583520  1.520151  2.367336 -1.084994 -1.233548 -2.414215   I know how to extract the data corresponding to a given column.  E.g. for column 'VII':
In [20]: df['VII'] Out[20]:  Alpha  Int  Bool  A      0    True     0.892865             False   -1.133833        1    True     0.171738             False   -0.485957        2    True     0.100844             False   -0.350906 B      0    True     0.009128             False   -1.125612        1    True    -0.009053             False   -0.329555        2    True    -0.212513             False   -2.414215 Name: VII   How do I extract the data matching the following sets of criteria:
Alpha=='B'Alpha=='B', Bool==False Alpha=='B', Bool==False, column 'I' Alpha=='B', Bool==False, columns 'I' and 'III' Alpha=='B', Bool==False, columns 'I', 'III', and all columns from 'V' onwardsInt is even(BTW, I did rtfm, more than once even, but I really find it incomprehensible.)
duplicated() function Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.
Sometimes you may have duplicates in pandas index and you can drop these using index. drop_duplicates() (dropduplicates).
Duplicate indexes are those that exactly match the Key and Included columns. That's easy. Possible duplicate indexes are those that very closely match Key/Included columns.
xs may be what you want. Here are a few examples:
In [63]: df.xs(('B',), level='Alpha') Out[63]:                   I        II       III        IV         V        VI       VII Int Bool                                                                        0   True  -0.430563  0.139969 -0.356883 -0.574463 -0.107693 -1.030063  0.271250     False  0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951 1   True  -0.450375  1.237018  0.398290  0.246182 -0.237919  1.372239 -0.805403     False -0.064493  0.967132 -0.674451  0.666691 -0.350378  1.721682 -0.791897 2   True   0.143154 -0.061543 -1.157361  0.864847 -0.379616 -0.762626  0.645582     False -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494  In [64]: df.xs(('B', False), level=('Alpha', 'Bool')) Out[64]:             I        II       III        IV         V        VI       VII Int                                                                       0    0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951 1   -0.064493  0.967132 -0.674451  0.666691 -0.350378  1.721682 -0.791897 2   -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494   Edit:
For the last requirement you can chain get_level_values and isin:
Get the even values in the index (other ways to do this too)
In [87]: ix_vals = set(i for _, i, _ in df.index if i % 2 == 0)          ix_vals  Out[87]: set([0L, 2L])  Use these with isin
In [89]: ix = df.index.get_level_values('Int').isin(ix_vals) In [90]: df[ix] Out[90]:                I        II       III        IV         V        VI       VII Alpha Int Bool                                                                        A     0   True  -1.315409  1.203800  0.330372 -0.295718 -0.679039  1.402114  0.778572           False  0.008189 -0.104372  0.419110  0.302978 -0.880262 -1.037645 -0.264265       2   True  -2.414290  0.896990  0.986167 -0.527074  0.550753 -0.302920  0.228165           False  1.275831  0.448089 -0.635874 -0.733855 -0.747774 -1.108976  0.151474 B     0   True  -0.430563  0.139969 -0.356883 -0.574463 -0.107693 -1.030063  0.271250           False  0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951       2   True   0.143154 -0.061543 -1.157361  0.864847 -0.379616 -0.762626  0.645582           False -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494  
                        You can use pd.IndexSlice for an intuitive way (Inspired from this answer). Some examples (using pandas 0.18.0):
df.sort_index(inplace=True) idx = pd.IndexSlice evens = np.arange(2,max(df.index.levels[1])+1,2)  df.loc[idx[['A','B'],evens,True],['III','V']] Out[]:                       III         V Alpha Int Bool                     A     2   True -1.041243 -0.561155 B     2   True  0.381918 -0.148990  df.loc[idx[:,evens,:],:] Out[]:                          I        II       III        IV         V        VI  \ Alpha Int Bool                                                                 A     2   False  0.791142  0.333383  0.089767 -0.584465  0.295676 -1.323792              True  -1.023160 -0.442004 -1.041243  1.613184 -0.561155  0.397923    B     2   False  0.383229 -0.052715 -0.214347 -2.041429 -1.101059 -0.374035              True  -0.183386 -0.855367  0.381918 -0.261106 -0.148990  0.621537                           VII   Alpha Int Bool              A     2   False  0.717301             True  -0.133701   B     2   False  0.166314             True   0.517513 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With