Selecting columns from pandas MultiIndex

Tags:

I have DataFrame with MultiIndex columns that looks like this:

# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

sample data

What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c'], not a range) from the second level?

Currently I am doing it like this:

import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)

expected result

It doesn't feel like a good solution, however, because I have to bust out itertools, build another MultiIndex by hand and then reindex (and my actual code is even messier, since the column lists aren't so simple to fetch). I am pretty sure there has to be some ix or xs way of doing this, but everything I tried resulted in errors.

985

asked Aug 27 '13 15:08

metakermit

4 Answers

The most straightforward way is with .loc:

>>> data.loc[:, (['one', 'two'], ['a', 'b'])]


   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

Remember that [] and () have special meaning when dealing with a MultiIndex object:

(...) a tuple is interpreted as one multi-level key

(...) a list is used to specify several keys [on the same level]

(...) a tuple of lists refer to several values within a level

When we write (['one', 'two'], ['a', 'b']), the first list inside the tuple specifies all the values we want from the 1st level of the MultiIndex. The second list inside the tuple specifies all the values we want from the 2nd level of the MultiIndex.

Edit 1: Another possibility is to use slice(None) to specify that we want anything from the first level (works similarly to slicing with : in lists). And then specify which columns from the second level we want.

>>> data.loc[:, (slice(None), ["a", "b"])]

   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

If the syntax slice(None) does appeal to you, then another possibility is to use pd.IndexSlice, which helps slicing frames with more elaborate indices.

>>> data.loc[:, pd.IndexSlice[:, ["a", "b"]]]

   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

When using pd.IndexSlice, we can use : as usual to slice the frame.

Source: MultiIndex / Advanced Indexing, How to use slice(None)

104

answered Oct 19 '22 00:10

Guilherme Salomé

It's not great, but maybe:

>>> data
        one                           two                    
          a         b         c         a         b         c
0 -0.927134 -1.204302  0.711426  0.854065 -0.608661  1.140052
1 -0.690745  0.517359 -0.631856  0.178464 -0.312543 -0.418541
2  1.086432  0.194193  0.808235 -0.418109  1.055057  1.886883
3 -0.373822 -0.012812  1.329105  1.774723 -2.229428 -0.617690
>>> data.loc[:,data.columns.get_level_values(1).isin({"a", "c"})]
        one                 two          
          a         c         a         c
0 -0.927134  0.711426  0.854065  1.140052
1 -0.690745 -0.631856  0.178464 -0.418541
2  1.086432  0.808235 -0.418109  1.886883
3 -0.373822  1.329105  1.774723 -0.617690

would work?

answered Oct 19 '22 00:10

DSM

You can use either, loc or ix I'll show an example with loc:

data.loc[:, [('one', 'a'), ('one', 'c'), ('two', 'a'), ('two', 'c')]]

When you have a MultiIndexed DataFrame, and you want to filter out only some of the columns, you have to pass a list of tuples that match those columns. So the itertools approach was pretty much OK, but you don't have to create a new MultiIndex:

data.loc[:, list(itertools.product(['one', 'two'], ['a', 'c']))]

answered Oct 19 '22 01:10

Viktor Kerkez

I think there is a much better way (now), which is why I bother pulling this question (which was the top google result) out of the shadows:

data.select(lambda x: x[1] in ['a', 'b'], axis=1)

gives your expected output in a quick and clean one-liner:

        one                 two          
          a         b         a         b
0 -0.341326  0.374504  0.534559  0.429019
1  0.272518  0.116542 -0.085850 -0.330562
2  1.982431 -0.420668 -0.444052  1.049747
3  0.162984 -0.898307  1.762208 -0.101360

It is mostly self-explaining, the [1] refers to the level.

answered Oct 19 '22 00:10

FooBar

Related questions
                            
                                Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
                            
                                Python Imaging Library - Text rendering
                            
                                python datetime fromtimestamp yielding valueerror year out of range [duplicate]
                            
                                Overriding python threading.Thread.run()
                            
                                Mock Python's built in print function
                            
                                Convert an image RGB->Lab with python
                            
                                In Tkinter is there any way to make a widget invisible?
                            
                                Catch KeyError in Python
                            
                                How to monkeypatch python's datetime.datetime.now with py.test?
                            
                                Easy_install and pip broke: pkg_resources.DistributionNotFound: distribute==0.6.36
                            
                                module' object has no attribute 'drawMatches' opencv python
                            
                                Accessing every 1st element of Pandas DataFrame column containing lists
                            
                                tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?
                            
                                Which tkinter modules were renamed in Python 3?
                            
                                Python File Slurp
                            
                                pass argument to __enter__
                            
                                How to install Python (any version) in Windows when you've no admin privileges?
                            
                                Efficient method to calculate the rank vector of a list in Python
                            
                                Appending values to dictionary in Python
                            
                                How can I stay tab-free in Geany on Ubuntu?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selecting columns from pandas MultiIndex

Tags:

python

pandas

multi-index

hierarchical

metakermit

People also ask

4 Answers

Guilherme Salomé

DSM

Viktor Kerkez

FooBar

Recent Activity

Donate For Us