I have a pandas dataframe with 3 levels of a MultiIndex. I am trying to pull out rows of this dataframe according to a list of values that correspond to two of the levels. I have something like this: <pre class="prettyprint"><code>ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c']) data = np.arange(len(ix)) df = pd.DataFrame(data, index=ix, columns=['hi']) print(df) hi a b c 1 foo baz 0 can 1 bar baz 2 can 3 2 foo baz 4 can 5 bar baz 6 can 7 3 foo baz 8 can 9 bar baz 10 can 11 </code></pre> Now I want to take all rows where index levels 'b' and 'c' are in this index: <pre class="prettyprint"><code>ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c']) </code></pre> i.e. values of <code>hi</code> having <code>('foo', 'can')</code> or <code>('bar', 'baz')</code> in levels <code>b</code> and <code>c</code> respectively: <code>(1, 2, 5, 6, 9, 10)</code>. So I'd like to take a <code>slice(None)</code> on the first level, and pull out specific tuples on the second and third levels. Initially I thought that passing a multi-index object to .loc would pull out the values / levels that I wanted, but this isn't working. What's the best way to do something like this?

Here is a way to get this slice: <pre class="prettyprint"><code>df.sort_index(inplace=True) idx = pd.IndexSlice df.loc[idx[:, ('foo','bar'), 'can'], :] </code></pre> yielding <pre class="prettyprint"><code> hi a b c 1 bar can 3 foo can 1 2 bar can 7 foo can 5 3 bar can 11 foo can 9 </code></pre> Note that you might need to sort MultiIndex before you can slice it. Well pandas is kind enough to warn if you need to do it: <pre class="prettyprint"><code>KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)' </code></pre> You can read more on how to use slicers in the docs If for some reason using slicers is not an option here is a way to get the same slice using <code>.isin()</code> method: <pre class="prettyprint"><code>df[df.index.get_level_values('b').isin(ix_use.get_level_values(0)) & df.index.get_level_values('c').isin(ix_use.get_level_values(1))] </code></pre> Which is clearly not as concise. UPDATE: For the conditions that you have updated here is a way to do it: <pre class="prettyprint"><code>cond1 = (df.index.get_level_values('b').isin(['foo'])) & (df.index.get_level_values('c').isin(['can'])) cond2 = (df.index.get_level_values('b').isin(['bar'])) & (df.index.get_level_values('c').isin(['baz'])) df[cond1 | cond2] </code></pre> producing: <pre class="prettyprint"><code> hi a b c 1 foo can 1 bar baz 2 2 foo can 5 bar baz 6 3 foo can 9 bar baz 10 </code></pre>

How to slice one MultiIndex DataFrame with the MultiIndex of another

Tags:

python

pandas

dataframe

multi-index

I have a pandas dataframe with 3 levels of a MultiIndex. I am trying to pull out rows of this dataframe according to a list of values that correspond to two of the levels.

I have something like this:

Click to copy

ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
data = np.arange(len(ix))
df = pd.DataFrame(data, index=ix, columns=['hi'])
print(df)

           hi
a b   c      
1 foo baz   0
      can   1
  bar baz   2
      can   3
2 foo baz   4
      can   5
  bar baz   6
      can   7
3 foo baz   8
      can   9
  bar baz  10
      can  11

Now I want to take all rows where index levels 'b' and 'c' are in this index:

Click to copy

ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])

i.e. values of hi having ('foo', 'can') or ('bar', 'baz') in levels b and c respectively: (1, 2, 5, 6, 9, 10).

So I'd like to take a slice(None) on the first level, and pull out specific tuples on the second and third levels.

Initially I thought that passing a multi-index object to .loc would pull out the values / levels that I wanted, but this isn't working. What's the best way to do something like this?

928

asked Mar 25 '15 21:03

choldgraf

1 Answers

Here is a way to get this slice:

Click to copy

df.sort_index(inplace=True)
idx = pd.IndexSlice
df.loc[idx[:, ('foo','bar'), 'can'], :]

yielding

Click to copy

           hi
a b   c      
1 bar can   3
  foo can   1
2 bar can   7
  foo can   5
3 bar can  11
  foo can   9

Note that you might need to sort MultiIndex before you can slice it. Well pandas is kind enough to warn if you need to do it:

Click to copy

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'

You can read more on how to use slicers in the docs

If for some reason using slicers is not an option here is a way to get the same slice using .isin() method:

Click to copy

df[df.index.get_level_values('b').isin(ix_use.get_level_values(0)) & df.index.get_level_values('c').isin(ix_use.get_level_values(1))]

Which is clearly not as concise.

UPDATE:

For the conditions that you have updated here is a way to do it:

Click to copy

cond1 = (df.index.get_level_values('b').isin(['foo'])) & (df.index.get_level_values('c').isin(['can']))
cond2 = (df.index.get_level_values('b').isin(['bar'])) & (df.index.get_level_values('c').isin(['baz']))
df[cond1 | cond2]

producing:

Click to copy

           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10

163

answered Sep 23 '22 00:09

Primer

Related questions
                            
                                Get overridden functions of subclass
                            
                                Convert X and Y arrays into a frequencies grid
                            
                                Does anyone know about workflow frameworks/libraries in Python?
                            
                                Matplotlib Contour Clabel Location
                            
                                python distutils does not include data_files
                            
                                Python Decorators and inheritance
                            
                                How can I use common code in python?
                            
                                Django admin and MongoDB, possible at all?
                            
                                How to create a legend for 3D bar in matplotlib?
                            
                                Python: Is there a way to get a local function variable from within a decorator that wraps it?
                            
                                Stacking astronomy images with Python
                            
                                Dynamic filepath & filename for FileHandler in logger config file in python
                            
                                Json dumping a dict throws TypeError: keys must be a string
                            
                                How do you unit test a nested function? [duplicate]
                            
                                Directing PyCharm to Python 3.3 interpreter?
                            
                                Passing memoryview to C function
                            
                                How can I generate documentation for a Python property setter using Sphinx?
                            
                                NLTK - Counting Frequency of Bigram
                            
                                I get an error in python3 when importing mechanize
                            
                                Importing opencv and getting numpy.core.multiarray failed to import

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to slice one MultiIndex DataFrame with the MultiIndex of another

Tags:

python

pandas

dataframe

multi-index

choldgraf

People also ask

1 Answers

Primer

Recent Activity

Donate For Us