MultiIndex-based indexing in pandas [duplicate]

Tags:

If I define a hierarchically-indexed dataframe like this:

import itertools import pandas as pd import numpy as np a = ('A', 'B') i = (0, 1, 2) b = (True, False) idx = pd.MultiIndex.from_tuples(list(itertools.product(a, i, b)),                                 names=('Alpha', 'Int', 'Bool')) df = pd.DataFrame(np.random.randn(len(idx), 7), index=idx,                   columns=('I', 'II', 'III', 'IV', 'V', 'VI', 'VII'))

the contents look like this:

In [19]: df Out[19]:                          I        II       III        IV         V        VI       VII Alpha Int Bool                                                                        A     0   True  -0.462924  1.210442  0.306737  0.325116 -1.320084 -0.831699  0.892865           False -0.850570 -0.949779  0.022074 -0.205575 -0.684794 -0.214307 -1.133833       1   True   0.603602  1.387020 -0.830780 -1.242000 -0.321938  0.484271  0.171738           False -1.591730  1.282136  0.095159 -1.239882  0.760880 -0.606444 -0.485957       2   True  -1.346883  1.650247 -1.476443  2.092067  1.344689  0.177083  0.100844           False  0.001407 -1.127299 -0.417828  0.143595 -0.277838 -0.478262 -0.350906 B     0   True   0.722781 -1.093182  0.237536  0.457614 -2.500885  0.338257  0.009128           False  0.321022  0.419357  1.161140 -1.371035  1.093696  0.250517 -1.125612       1   True   0.237441  1.739933  0.029653  0.327823 -0.384647  1.523628 -0.009053           False -0.459148 -0.598577 -0.593486 -0.607447  1.478399  0.504028 -0.329555       2   True  -0.583052 -0.986493 -0.057788 -0.639798  1.400311  0.076471 -0.212513           False  0.896755  2.583520  1.520151  2.367336 -1.084994 -1.233548 -2.414215

I know how to extract the data corresponding to a given column. E.g. for column 'VII':

In [20]: df['VII'] Out[20]:  Alpha  Int  Bool  A      0    True     0.892865             False   -1.133833        1    True     0.171738             False   -0.485957        2    True     0.100844             False   -0.350906 B      0    True     0.009128             False   -1.125612        1    True    -0.009053             False   -0.329555        2    True    -0.212513             False   -2.414215 Name: VII

How do I extract the data matching the following sets of criteria:

Alpha=='B'
Alpha=='B', Bool==False
Alpha=='B', Bool==False, column 'I'
Alpha=='B', Bool==False, columns 'I' and 'III'
Alpha=='B', Bool==False, columns 'I', 'III', and all columns from 'V' onwards
Int is even

(BTW, I did rtfm, more than once even, but I really find it incomprehensible.)

840

asked Feb 19 '13 18:02

kjo

2 Answers

xs may be what you want. Here are a few examples:

In [63]: df.xs(('B',), level='Alpha') Out[63]:                   I        II       III        IV         V        VI       VII Int Bool                                                                        0   True  -0.430563  0.139969 -0.356883 -0.574463 -0.107693 -1.030063  0.271250     False  0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951 1   True  -0.450375  1.237018  0.398290  0.246182 -0.237919  1.372239 -0.805403     False -0.064493  0.967132 -0.674451  0.666691 -0.350378  1.721682 -0.791897 2   True   0.143154 -0.061543 -1.157361  0.864847 -0.379616 -0.762626  0.645582     False -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494  In [64]: df.xs(('B', False), level=('Alpha', 'Bool')) Out[64]:             I        II       III        IV         V        VI       VII Int                                                                       0    0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951 1   -0.064493  0.967132 -0.674451  0.666691 -0.350378  1.721682 -0.791897 2   -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494

Edit:

For the last requirement you can chain get_level_values and isin:

Get the even values in the index (other ways to do this too)

In [87]: ix_vals = set(i for _, i, _ in df.index if i % 2 == 0)          ix_vals  Out[87]: set([0L, 2L])

Use these with isin

In [89]: ix = df.index.get_level_values('Int').isin(ix_vals) In [90]: df[ix] Out[90]:                I        II       III        IV         V        VI       VII Alpha Int Bool                                                                        A     0   True  -1.315409  1.203800  0.330372 -0.295718 -0.679039  1.402114  0.778572           False  0.008189 -0.104372  0.419110  0.302978 -0.880262 -1.037645 -0.264265       2   True  -2.414290  0.896990  0.986167 -0.527074  0.550753 -0.302920  0.228165           False  1.275831  0.448089 -0.635874 -0.733855 -0.747774 -1.108976  0.151474 B     0   True  -0.430563  0.139969 -0.356883 -0.574463 -0.107693 -1.030063  0.271250           False  0.334960 -0.640764 -0.515756 -0.327806 -0.006574  0.183520  1.397951       2   True   0.143154 -0.061543 -1.157361  0.864847 -0.379616 -0.762626  0.645582           False -3.253589  0.729562 -0.839622 -1.088309  0.039522  0.980831 -0.113494

178

answered Oct 23 '22 21:10

beardc

You can use pd.IndexSlice for an intuitive way (Inspired from this answer). Some examples (using pandas 0.18.0):

df.sort_index(inplace=True) idx = pd.IndexSlice evens = np.arange(2,max(df.index.levels[1])+1,2)  df.loc[idx[['A','B'],evens,True],['III','V']] Out[]:                       III         V Alpha Int Bool                     A     2   True -1.041243 -0.561155 B     2   True  0.381918 -0.148990  df.loc[idx[:,evens,:],:] Out[]:                          I        II       III        IV         V        VI  \ Alpha Int Bool                                                                 A     2   False  0.791142  0.333383  0.089767 -0.584465  0.295676 -1.323792              True  -1.023160 -0.442004 -1.041243  1.613184 -0.561155  0.397923    B     2   False  0.383229 -0.052715 -0.214347 -2.041429 -1.101059 -0.374035              True  -0.183386 -0.855367  0.381918 -0.261106 -0.148990  0.621537                           VII   Alpha Int Bool              A     2   False  0.717301             True  -0.133701   B     2   False  0.166314             True   0.517513

answered Oct 23 '22 22:10

TNT

Related questions
                            
                                Difference between "section" and "segment" in NASM
                            
                                Remapping a TFS project on a new computer
                            
                                sqlite3 "foreign key constraint failed"
                            
                                Select a multiple-key cross section from a DataFrame
                            
                                Nested mixins or functions in SASS
                            
                                Which one is better among <c:import> and <jsp:include> in terms of performance?
                            
                                Check if an ArrayList contains a given object
                            
                                Arrays in Ruby: Take vs Limit vs First
                            
                                Android: How to draw free on top of anything (any activity)
                            
                                Diff of current and previous version using vim-fugitive
                            
                                Should the order of LINQ query clauses affect Entity Framework performance?
                            
                                Testing Middleware with Rspec

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MultiIndex-based indexing in pandas [duplicate]

Tags:

kjo

People also ask

2 Answers

beardc

TNT

Recent Activity

Donate For Us