I have a multi=indexed DataFrame, but I want to keep only two columns per level 1, for each of the level 0 variables (i.e. columns 'one' and 'two'). I can subset them separately, but I would like to do it together so I can keep the values side by side Here is the DataFrame <pre class="prettyprint"><code>index = pd.MultiIndex.from_tuples(list(zip(*[['bar1', 'foo1', 'bar1', 'foo2','bar3','foo3'], ['one','two','three','two','one','four']]))) df = pd.DataFrame(np.random.randn(2, 6), columns=index) </code></pre> Here is the way to subset for one column in level 1 <pre class="prettyprint"><code>df.iloc[:, df.columns.get_level_values(1)== 'one'] # or df.xs('one', level=1, axis=1) # but adding two columns within either command will not work e.g. df.xs(('one','two), level=1, axis=1) </code></pre> This would be the expected output <pre class="prettyprint"><code> bar1 foo1 foo2 bar3 one two two one 0 -0.508272 -0.195379 0.865563 2.002205 1 -0.771565 1.360479 1.900931 -1.589277 </code></pre>

Here's one way using <code>pd.IndexSlice</code>: <pre class="prettyprint"><code>idnx = pd.IndexSlice[:, ['one', 'two']] df.loc[:, idnx] </code></pre> Output: <pre class="prettyprint"><code> bar1 bar3 foo1 foo2 one one two two 0 0.589999 0.261224 -0.106588 -2.309628 1 0.646201 -0.491110 0.430724 1.027424 </code></pre> <hr> Another way using a little known argument, <code>axis</code>, of <code>pd.DataFrame.loc</code>: <pre class="prettyprint"><code>df.loc(axis=1)[:, ['one', 'two']] </code></pre> Output: <pre class="prettyprint"><code> bar1 bar3 foo1 foo2 one one two two 0 0.589999 0.261224 -0.106588 -2.309628 1 0.646201 -0.491110 0.430724 1.027424 </code></pre> NOTE: This argument is not listed in the documented API for pd.DataFrame.loc, but is referenced in the User Guide in MultiIndex / Advanced indexing section in the Using Slicers paragraph about middle way down with an example.

We can use <code>Index.isin</code> on a specific level to create a Boolean index and select with <code>loc</code>: <pre class="prettyprint"><code>df.loc[:, df.columns.isin(['one', 'two'], level=1)] </code></pre> <code>df</code>: <pre class="prettyprint"><code> bar1 foo1 foo2 bar3 one two two one 0 0.042062 -0.233098 0.620974 0.330957 1 0.524495 -0.394930 0.572631 0.499279 </code></pre>

subset multi-indexed df based on multiple level 1 columns

Tags:

python

pandas

dataframe

multi-index

I have a multi=indexed DataFrame, but I want to keep only two columns per level 1, for each of the level 0 variables (i.e. columns 'one' and 'two'). I can subset them separately, but I would like to do it together so I can keep the values side by side

Here is the DataFrame

index = pd.MultiIndex.from_tuples(list(zip(*[['bar1', 'foo1', 'bar1', 'foo2','bar3','foo3'], ['one','two','three','two','one','four']])))
df = pd.DataFrame(np.random.randn(2, 6), columns=index)

Here is the way to subset for one column in level 1

df.iloc[:, df.columns.get_level_values(1)== 'one']
# or 
df.xs('one', level=1, axis=1)

# but adding two columns within either command will not work e.g. 
df.xs(('one','two), level=1, axis=1)

This would be the expected output

         bar1        foo1       foo2         bar3
          one         two        two          one
0   -0.508272   -0.195379   0.865563     2.002205
1   -0.771565    1.360479   1.900931    -1.589277

229

asked Aug 13 '21 15:08

an10b3

4 Answers

Here's one way using pd.IndexSlice:

idnx = pd.IndexSlice[:, ['one', 'two']]
df.loc[:, idnx]

Output:

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

Another way using a little known argument, axis, of pd.DataFrame.loc:

df.loc(axis=1)[:, ['one', 'two']]

Output:

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

NOTE: This argument is not listed in the documented API for pd.DataFrame.loc, but is referenced in the User Guide in MultiIndex / Advanced indexing section in the Using Slicers paragraph about middle way down with an example.

152

answered Nov 10 '22 07:11

Scott Boston

We can use Index.isin on a specific level to create a Boolean index and select with loc:

df.loc[:, df.columns.isin(['one', 'two'], level=1)]

df:

       bar1      foo1      foo2      bar3
        one       two       two       one
0  0.042062 -0.233098  0.620974  0.330957
1  0.524495 -0.394930  0.572631  0.499279

answered Nov 10 '22 06:11

Henry Ecker

You can reindex and specify the level.

df.reindex(['one', 'two'], axis=1, level=1)

       bar1      foo1      foo2      bar3
        one       two       two       one
0  0.276056  1.956400 -1.495128  1.582220
1 -0.383178  1.159138 -1.646173  0.821942

answered Nov 10 '22 07:11

ALollz

Check the old fashion get_level_values

out = df.loc[:,df.columns.get_level_values(1).isin(['one','two'])]
Out[454]: 
       bar1      foo1      foo2      bar3
        one       two       two       one
0 -0.705540 -1.175132 -0.572076 -1.549703
1  0.277905  1.789925  1.104225  0.104453

answered Nov 10 '22 05:11

BENY

Related questions
                            
                                AttributeError: module '_Box2D' has no attribute 'RAND_LIMIT_swigconstant'
                            
                                Django / makemigrations ModuleNotFoundError: No module named 'idmp_core.apps.IdmpCoreConfigdjango';
                            
                                S3 Delete files inside a folder using boto3
                            
                                How can I use named arguments in a decorator?
                            
                                Why is subprocess.Popen not waiting until the child process terminates?
                            
                                Is there any linux distribution that comes with python 3?
                            
                                Are there any basic standards and practices for making human readable code?
                            
                                Remove single quotes from python list item
                            
                                IDLE won't highlight my syntax
                            
                                How to check if a list exists in Python
                            
                                Project Euler getting smallest multiple in python
                            
                                How to install and invoke Stanford NERTagger?
                            
                                efficient ways of finding the largest prime factor of a number
                            
                                pythonic way to hex dump files
                            
                                Save XML response from GET call using Python
                            
                                How to use Python 'in' operator to check my list/tuple contains each of the integers 0, 1, 2?
                            
                                Pandas 0.19.2 read_excel IndexError: List index out of range
                            
                                Python to run a piece of code at a defined time every day
                            
                                I want to make Pretty JSON formatting in Flask [duplicate]
                            
                                cv2: [ WARN:0] global cap_msmf.cpp (674) SourceReaderCB::~SourceReaderCB terminating async callback [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With