If I define a hierarchically-indexed dataframe like this:
import itertools import pandas as pd import numpy as np a = ('A', 'B') i = (0, 1, 2) b = (True, False) idx = pd.MultiIndex.from_tuples(list(itertools.product(a, i, b)), names=('Alpha', 'Int', 'Bool')) df = pd.DataFrame(np.random.randn(len(idx), 7), index=idx, columns=('I', 'II', 'III', 'IV', 'V', 'VI', 'VII'))
the contents look like this:
In [19]: df Out[19]: I II III IV V VI VII Alpha Int Bool A 0 True -0.462924 1.210442 0.306737 0.325116 -1.320084 -0.831699 0.892865 False -0.850570 -0.949779 0.022074 -0.205575 -0.684794 -0.214307 -1.133833 1 True 0.603602 1.387020 -0.830780 -1.242000 -0.321938 0.484271 0.171738 False -1.591730 1.282136 0.095159 -1.239882 0.760880 -0.606444 -0.485957 2 True -1.346883 1.650247 -1.476443 2.092067 1.344689 0.177083 0.100844 False 0.001407 -1.127299 -0.417828 0.143595 -0.277838 -0.478262 -0.350906 B 0 True 0.722781 -1.093182 0.237536 0.457614 -2.500885 0.338257 0.009128 False 0.321022 0.419357 1.161140 -1.371035 1.093696 0.250517 -1.125612 1 True 0.237441 1.739933 0.029653 0.327823 -0.384647 1.523628 -0.009053 False -0.459148 -0.598577 -0.593486 -0.607447 1.478399 0.504028 -0.329555 2 True -0.583052 -0.986493 -0.057788 -0.639798 1.400311 0.076471 -0.212513 False 0.896755 2.583520 1.520151 2.367336 -1.084994 -1.233548 -2.414215
I know how to extract the data corresponding to a given column. E.g. for column 'VII'
:
In [20]: df['VII'] Out[20]: Alpha Int Bool A 0 True 0.892865 False -1.133833 1 True 0.171738 False -0.485957 2 True 0.100844 False -0.350906 B 0 True 0.009128 False -1.125612 1 True -0.009053 False -0.329555 2 True -0.212513 False -2.414215 Name: VII
How do I extract the data matching the following sets of criteria:
Alpha=='B'
Alpha=='B'
, Bool==False
Alpha=='B'
, Bool==False
, column 'I'
Alpha=='B'
, Bool==False
, columns 'I'
and 'III'
Alpha=='B'
, Bool==False
, columns 'I'
, 'III'
, and all columns from 'V'
onwardsInt
is even(BTW, I did rtfm, more than once even, but I really find it incomprehensible.)
duplicated() function Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.
Sometimes you may have duplicates in pandas index and you can drop these using index. drop_duplicates() (dropduplicates).
Duplicate indexes are those that exactly match the Key and Included columns. That's easy. Possible duplicate indexes are those that very closely match Key/Included columns.
xs may be what you want. Here are a few examples:
In [63]: df.xs(('B',), level='Alpha') Out[63]: I II III IV V VI VII Int Bool 0 True -0.430563 0.139969 -0.356883 -0.574463 -0.107693 -1.030063 0.271250 False 0.334960 -0.640764 -0.515756 -0.327806 -0.006574 0.183520 1.397951 1 True -0.450375 1.237018 0.398290 0.246182 -0.237919 1.372239 -0.805403 False -0.064493 0.967132 -0.674451 0.666691 -0.350378 1.721682 -0.791897 2 True 0.143154 -0.061543 -1.157361 0.864847 -0.379616 -0.762626 0.645582 False -3.253589 0.729562 -0.839622 -1.088309 0.039522 0.980831 -0.113494 In [64]: df.xs(('B', False), level=('Alpha', 'Bool')) Out[64]: I II III IV V VI VII Int 0 0.334960 -0.640764 -0.515756 -0.327806 -0.006574 0.183520 1.397951 1 -0.064493 0.967132 -0.674451 0.666691 -0.350378 1.721682 -0.791897 2 -3.253589 0.729562 -0.839622 -1.088309 0.039522 0.980831 -0.113494
Edit:
For the last requirement you can chain get_level_values
and isin
:
Get the even values in the index (other ways to do this too)
In [87]: ix_vals = set(i for _, i, _ in df.index if i % 2 == 0) ix_vals Out[87]: set([0L, 2L])
Use these with isin
In [89]: ix = df.index.get_level_values('Int').isin(ix_vals) In [90]: df[ix] Out[90]: I II III IV V VI VII Alpha Int Bool A 0 True -1.315409 1.203800 0.330372 -0.295718 -0.679039 1.402114 0.778572 False 0.008189 -0.104372 0.419110 0.302978 -0.880262 -1.037645 -0.264265 2 True -2.414290 0.896990 0.986167 -0.527074 0.550753 -0.302920 0.228165 False 1.275831 0.448089 -0.635874 -0.733855 -0.747774 -1.108976 0.151474 B 0 True -0.430563 0.139969 -0.356883 -0.574463 -0.107693 -1.030063 0.271250 False 0.334960 -0.640764 -0.515756 -0.327806 -0.006574 0.183520 1.397951 2 True 0.143154 -0.061543 -1.157361 0.864847 -0.379616 -0.762626 0.645582 False -3.253589 0.729562 -0.839622 -1.088309 0.039522 0.980831 -0.113494
You can use pd.IndexSlice for an intuitive way (Inspired from this answer). Some examples (using pandas 0.18.0):
df.sort_index(inplace=True) idx = pd.IndexSlice evens = np.arange(2,max(df.index.levels[1])+1,2) df.loc[idx[['A','B'],evens,True],['III','V']] Out[]: III V Alpha Int Bool A 2 True -1.041243 -0.561155 B 2 True 0.381918 -0.148990 df.loc[idx[:,evens,:],:] Out[]: I II III IV V VI \ Alpha Int Bool A 2 False 0.791142 0.333383 0.089767 -0.584465 0.295676 -1.323792 True -1.023160 -0.442004 -1.041243 1.613184 -0.561155 0.397923 B 2 False 0.383229 -0.052715 -0.214347 -2.041429 -1.101059 -0.374035 True -0.183386 -0.855367 0.381918 -0.261106 -0.148990 0.621537 VII Alpha Int Bool A 2 False 0.717301 True -0.133701 B 2 False 0.166314 True 0.517513
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With