Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Too many indexers" with DataFrame.loc

Tags:

python

pandas

I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc to slice a DataFrame with a MultiIndex.

I'll start with the DataFrame from this SO answer:

                           value first second third fourth        A0    B0     C1    D0          2                    D1          3              C2    D0          6                    D1          7       B1     C1    D0         10                    D1         11              C2    D0         14                    D1         15 A1    B0     C1    D0         18                    D1         19              C2    D0         22                    D1         23       B1     C1    D0         26                    D1         27              C2    D0         30                    D1         31 A2    B0     C1    D0         34                    D1         35              C2    D0         38                    D1         39       B1     C1    D0         42                    D1         43              C2    D0         46                    D1         47 A3    B0     C1    D0         50                    D1         51              C2    D0         54                    D1         55       B1     C1    D0         58                    D1         59              C2    D0         62                    D1         63 

To select just A0 and C1 values, I can do:

In [26]: df.loc['A0', :, 'C1', :] Out[26]:                             value first second third fourth        A0    B0     C1    D0          2                    D1          3       B1     C1    D0         10                    D1         11 

Which also works selecting from three levels, and even with tuples:

In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1'] Out[28]:                             value first second third fourth        A0    B0     C1    D1          3              C2    D1          5       B1     C1    D1         11              C2    D1         13 

So far, intuitive and brilliant.

So why can't I select all values from the first index level?

In [30]: df.loc[:, :, 'C1', :] --------------------------------------------------------------------------- IndexingError                             Traceback (most recent call last) <ipython-input-30-57b56108d941> in <module>() ----> 1 df.loc[:, :, 'C1', :]  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)    1176     def __getitem__(self, key):    1177         if type(key) is tuple: -> 1178             return self._getitem_tuple(key)    1179         else:    1180             return self._getitem_axis(key, axis=0)  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)     694      695         # no multi-index, so validate all of the indexers --> 696         self._has_valid_tuple(tup)     697      698         # ugly hack for GH #836  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)     125         for i, k in enumerate(key):     126             if i >= self.obj.ndim: --> 127                 raise IndexingError('Too many indexers')     128             if not self._has_valid_type(k, i):     129                 raise ValueError("Location based indexing can only have [%s] "  IndexingError: Too many indexers 

Surely this is not intended behaviour?

Note: I know this is possible with df.xs('C1', level='third') but the current .loc behaviour seems inconsistent.

like image 420
LondonRob Avatar asked Jun 11 '15 12:06

LondonRob


People also ask

Is ILOC slower than LOC?

loc . I have a DataFrame with 4.8 million rows, and selecting a single row using . iloc[[ id ]] (with a single-element list) takes 489 ms, almost half a second, 1,800x times slower than the identical .

Is Pandas query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.

Can I use ILOC and LOC together?

loc and iloc are interchangeable when labels are 0-based integers.

When should I use Loc ILOC?

When it comes to selecting rows and columns of a pandas DataFrame, loc and iloc are two commonly used functions. Here is the subtle difference between the two functions: loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.


1 Answers

The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:

df.loc(axis=0)[:, :, 'C1', :] 

Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.

like image 148
djakubosky Avatar answered Oct 02 '22 19:10

djakubosky