I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use <code>loc</code> to slice a <code>DataFrame</code> with a <code>MultiIndex</code>. I'll start with the <code>DataFrame</code> from this SO answer: <pre class="prettyprint"><code> value first second third fourth A0 B0 C1 D0 2 D1 3 C2 D0 6 D1 7 B1 C1 D0 10 D1 11 C2 D0 14 D1 15 A1 B0 C1 D0 18 D1 19 C2 D0 22 D1 23 B1 C1 D0 26 D1 27 C2 D0 30 D1 31 A2 B0 C1 D0 34 D1 35 C2 D0 38 D1 39 B1 C1 D0 42 D1 43 C2 D0 46 D1 47 A3 B0 C1 D0 50 D1 51 C2 D0 54 D1 55 B1 C1 D0 58 D1 59 C2 D0 62 D1 63 </code></pre> To select just <code>A0</code> and <code>C1</code> values, I can do: <pre class="prettyprint"><code>In [26]: df.loc['A0', :, 'C1', :] Out[26]: value first second third fourth A0 B0 C1 D0 2 D1 3 B1 C1 D0 10 D1 11 </code></pre> Which also works selecting from three levels, and even with tuples: <pre class="prettyprint"><code>In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1'] Out[28]: value first second third fourth A0 B0 C1 D1 3 C2 D1 5 B1 C1 D1 11 C2 D1 13 </code></pre> So far, intuitive and brilliant. So why can't I select all values from the first index level? <pre class="prettyprint"><code>In [30]: df.loc[:, :, 'C1', :] --------------------------------------------------------------------------- IndexingError Traceback (most recent call last) <ipython-input-30-57b56108d941> in <module>() ----> 1 df.loc[:, :, 'C1', :] /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key) 1176 def __getitem__(self, key): 1177 if type(key) is tuple: -> 1178 return self._getitem_tuple(key) 1179 else: 1180 return self._getitem_axis(key, axis=0) /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup) 694 695 # no multi-index, so validate all of the indexers --> 696 self._has_valid_tuple(tup) 697 698 # ugly hack for GH #836 /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key) 125 for i, k in enumerate(key): 126 if i >= self.obj.ndim: --> 127 raise IndexingError('Too many indexers') 128 if not self._has_valid_type(k, i): 129 raise ValueError("Location based indexing can only have [%s] " IndexingError: Too many indexers </code></pre> Surely this is not intended behaviour? Note: I know this is possible with <code>df.xs('C1', level='third')</code> but the current <code>.loc</code> behaviour seems inconsistent.

The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this: <pre class="prettyprint"><code>df.loc(axis=0)[:, :, 'C1', :] </code></pre> Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.

"Too many indexers" with DataFrame.loc

Tags:

python

pandas

I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc to slice a DataFrame with a MultiIndex.

I'll start with the DataFrame from this SO answer:

                           value first second third fourth        A0    B0     C1    D0          2                    D1          3              C2    D0          6                    D1          7       B1     C1    D0         10                    D1         11              C2    D0         14                    D1         15 A1    B0     C1    D0         18                    D1         19              C2    D0         22                    D1         23       B1     C1    D0         26                    D1         27              C2    D0         30                    D1         31 A2    B0     C1    D0         34                    D1         35              C2    D0         38                    D1         39       B1     C1    D0         42                    D1         43              C2    D0         46                    D1         47 A3    B0     C1    D0         50                    D1         51              C2    D0         54                    D1         55       B1     C1    D0         58                    D1         59              C2    D0         62                    D1         63

To select just A0 and C1 values, I can do:

In [26]: df.loc['A0', :, 'C1', :] Out[26]:                             value first second third fourth        A0    B0     C1    D0          2                    D1          3       B1     C1    D0         10                    D1         11

Which also works selecting from three levels, and even with tuples:

In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1'] Out[28]:                             value first second third fourth        A0    B0     C1    D1          3              C2    D1          5       B1     C1    D1         11              C2    D1         13

So far, intuitive and brilliant.

So why can't I select all values from the first index level?

In [30]: df.loc[:, :, 'C1', :] --------------------------------------------------------------------------- IndexingError                             Traceback (most recent call last) <ipython-input-30-57b56108d941> in <module>() ----> 1 df.loc[:, :, 'C1', :]  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)    1176     def __getitem__(self, key):    1177         if type(key) is tuple: -> 1178             return self._getitem_tuple(key)    1179         else:    1180             return self._getitem_axis(key, axis=0)  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)     694      695         # no multi-index, so validate all of the indexers --> 696         self._has_valid_tuple(tup)     697      698         # ugly hack for GH #836  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)     125         for i, k in enumerate(key):     126             if i >= self.obj.ndim: --> 127                 raise IndexingError('Too many indexers')     128             if not self._has_valid_type(k, i):     129                 raise ValueError("Location based indexing can only have [%s] "  IndexingError: Too many indexers

Surely this is not intended behaviour?

Note: I know this is possible with df.xs('C1', level='third') but the current .loc behaviour seems inconsistent.

420

asked Jun 11 '15 12:06

LondonRob

1 Answers

The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:

df.loc(axis=0)[:, :, 'C1', :]

Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.

148

answered Oct 02 '22 19:10

djakubosky

Related questions
                            
                                Example of the right way to use QThread in PyQt?
                            
                                How to expand all the subsections on the sidebar toctree in Sphinx?
                            
                                How to work within individual files rather than projects in PyCharm
                            
                                Execute terminal commands in python3 [duplicate]
                            
                                conda environment to AWS Lambda
                            
                                git log --follow, the gitpython way
                            
                                Understanding matplotlib: plt, figure, ax(arr)?
                            
                                Python Json loads() returning string instead of dictionary?
                            
                                How to connect PyCharm to a python interpreter located inside a Docker container?
                            
                                What is the equivalent of a Python docstring in Ruby?
                            
                                How to limit python traceback to specific files
                            
                                Kmeans without knowing the number of clusters? [duplicate]
                            
                                Relative imports with unittest in Python
                            
                                Detect if a text image is upside down
                            
                                Given the name of a Python package, what is the name of the module to import?
                            
                                numerically stable way to multiply log probability matrices in numpy
                            
                                How to obtain the same font(-style, -size etc.) in matplotlib output as in latex output?
                            
                                Name of a function returning a generator
                            
                                Scrapy Python Set up User Agent
                            
                                Necessity of explicit cursor.close()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With