There are many questions here with similar titles, but I couldn't find one that's addressing this issue. I have dataframes from many different origins, and I want to filter one by the other. Using boolean indexing works great when the boolean series is the same size as the filtered dataframe, but not when the size of the series is the same as a higher level index of the filtered dataframe. In short, let's say I have this dataframe: <pre class="prettyprint"><code>In [4]: df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 'b':[1,2,3,1,2,3,1,2,3], 'c':range(9)}).set_index(['a', 'b']) Out[4]: c a b 1 1 0 2 1 3 2 2 1 3 2 4 3 5 3 1 6 2 7 3 8 </code></pre> And this series: <pre class="prettyprint"><code>In [5]: filt = pd.Series({1:True, 2:False, 3:True}) Out[6]: 1 True 2 False 3 True dtype: bool </code></pre> And the output I want is this: <pre class="prettyprint"><code> c a b 1 1 0 2 1 3 2 3 1 6 2 7 3 8 </code></pre> I am not looking for solutions that are not using the <code>filt</code> series, such as: <pre class="prettyprint"><code>df[df.index.get_level_values('a') != 2] df[df.index.get_level_values('a').isin([1,3])] </code></pre> I want to know if I can use my input <code>filt</code> series as is, as I would use a filter on c: <pre class="prettyprint"><code>filt = df.c < 7 df[filt] </code></pre>

You can use pd.IndexSlicer. <pre class="prettyprint"><code>>>> df.loc[pd.IndexSlice[filt[filt].index.values, :], :] c a b 1 1 0 2 1 3 2 3 1 6 2 7 3 8 </code></pre> where <code>filt[filt].index.values</code> is just <code>[1, 3]</code>. In other words <pre class="prettyprint"><code>>>> df.loc[pd.IndexSlice[[1, 3], :]] c a b 1 1 0 2 1 3 2 3 1 6 2 7 3 8 </code></pre> so if you design your filter construction a bit differently, the expression gets shorter. The advantave over Emanuele Paolini's solution <code>df[filt[df.index.get_level_values('a')].values]</code> is, that you have more control over the indexing. The topic of multiindex slicing is covered in more depth here. Here the full code <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 'b':[1,2,3,1,2,3,1,2,3], 'c':range(9)}).set_index(['a', 'b']) filt = pd.Series({1:True, 2:False, 3:True}) print(df.loc[pd.IndexSlice[[1, 3], :]]) print(df.loc[(df.index.levels[0].values[filt], slice(None)), :]) print(df.loc[pd.IndexSlice[filt[filt].index.values, :], :]) </code></pre>

pandas: Boolean indexing with multi index

Tags:

python

pandas

There are many questions here with similar titles, but I couldn't find one that's addressing this issue.

I have dataframes from many different origins, and I want to filter one by the other. Using boolean indexing works great when the boolean series is the same size as the filtered dataframe, but not when the size of the series is the same as a higher level index of the filtered dataframe.

In short, let's say I have this dataframe:

Click to copy

In [4]: df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 
                           'b':[1,2,3,1,2,3,1,2,3], 
                           'c':range(9)}).set_index(['a', 'b'])
Out[4]: 
     c
a b   
1 1  0
  2  1
  3  2
2 1  3
  2  4
  3  5
3 1  6
  2  7
  3  8

And this series:

Click to copy

In [5]: filt = pd.Series({1:True, 2:False, 3:True})
Out[6]: 
1     True
2    False
3     True
dtype: bool

And the output I want is this:

Click to copy

I am not looking for solutions that are not using the filt series, such as:

Click to copy

df[df.index.get_level_values('a') != 2]
df[df.index.get_level_values('a').isin([1,3])]

I want to know if I can use my input filt series as is, as I would use a filter on c:

Click to copy

filt = df.c < 7
df[filt]

654

asked Sep 14 '14 19:09

Korem

1 Answers

You can use pd.IndexSlicer.

Click to copy

>>> df.loc[pd.IndexSlice[filt[filt].index.values, :], :]
     c
a b   
1 1  0
  2  1
  3  2
3 1  6
  2  7
  3  8

where filt[filt].index.values is just [1, 3]. In other words

Click to copy

>>> df.loc[pd.IndexSlice[[1, 3], :]]
     c
a b   
1 1  0
  2  1
  3  2
3 1  6
  2  7
  3  8

so if you design your filter construction a bit differently, the expression gets shorter. The advantave over Emanuele Paolini's solution df[filt[df.index.get_level_values('a')].values] is, that you have more control over the indexing.

The topic of multiindex slicing is covered in more depth here.

Here the full code

Click to copy

import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 'b':[1,2,3,1,2,3,1,2,3], 'c':range(9)}).set_index(['a', 'b'])
filt = pd.Series({1:True, 2:False, 3:True})

print(df.loc[pd.IndexSlice[[1, 3], :]])
print(df.loc[(df.index.levels[0].values[filt], slice(None)), :])
print(df.loc[pd.IndexSlice[filt[filt].index.values, :], :])

144

answered Oct 18 '22 05:10

Markus Dutschke

Related questions
                            
                                OpenCV 2.4.3 and Python
                            
                                Passing a set of NumPy arrays into C function for input and output
                            
                                getting sphinx to recognise correct signature
                            
                                Django unit test client response has empty context
                            
                                Python converting latin1 to UTF8 [duplicate]
                            
                                Calling python script with subprocess.Popen and flushing the data
                            
                                Celery Task Chain and Accessing **kwargs
                            
                                Returning from __len__() when >64 bits
                            
                                Python file.tell() giving strange numbers?
                            
                                Generate big random sequence of unique numbers [duplicate]
                            
                                python, storing dictionaries inside a dataframe
                            
                                How to generate a distribution with a given mean, variance, skew and kurtosis in Python?
                            
                                Distribution independent libpython path
                            
                                Python Module Imports - Explicit vs Implicit Relative Imports
                            
                                Does python list store the object or the reference to the object?
                            
                                replace rows in a pandas data frame
                            
                                Why are explicit calls to magic methods slower than "sugared" syntax?
                            
                                Scrapy CrawlSpider for AJAX content
                            
                                Pycharm: Expected type 'Integral', got 'str' instead
                            
                                make python wait for stored procedure to finish executing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas: Boolean indexing with multi index

Tags:

python

pandas

Korem

People also ask

1 Answers

Markus Dutschke

Recent Activity

Donate For Us