Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: slice a MultiIndex by range of secondary index

Tags:

python

pandas

I have a series with a MultiIndex like this:

import numpy as np import pandas as pd  buckets = np.repeat(['a','b','c'], [3,5,1]) sequence = [0,1,5,0,1,2,4,50,0]  s = pd.Series(     np.random.randn(len(sequence)),      index=pd.MultiIndex.from_tuples(zip(buckets, sequence)) )  # In [6]: s # Out[6]:  # a  0    -1.106047 #    1     1.665214 #    5     0.279190 # b  0     0.326364 #    1     0.900439 #    2    -0.653940 #    4     0.082270 #    50   -0.255482 # c  0    -0.091730 

I'd like to get the s['b'] values where the second index ('sequence') is between 2 and 10.

Slicing on the first index works fine:

s['a':'b'] # Out[109]:  # bucket  value # a       0        1.828176 #         1        0.160496 #         5        0.401985 # b       0       -1.514268 #         1       -0.973915 #         2        1.285553 #         4       -0.194625 #         5       -0.144112 

But not on the second, at least by what seems to be the two most obvious ways:

1) This returns elements 1 through 4, with nothing to do with the index values

s['b'][1:10]  # In [61]: s['b'][1:10] # Out[61]:  # 1     0.900439 # 2    -0.653940 # 4     0.082270 # 50   -0.255482 

However, if I reverse the index and the first index is integer and the second index is a string, it works:

In [26]: s Out[26]:  0   a   -0.126299 1   a    1.810928 5   a    0.571873 0   b   -0.116108 1   b   -0.712184 2   b   -1.771264 4   b    0.148961 50  b    0.089683 0   c   -0.582578  In [25]: s[0]['a':'b'] Out[25]:  a   -0.126299 b   -0.116108 
like image 519
alaiacano Avatar asked Nov 14 '12 23:11

alaiacano


People also ask

How do I slice a range of columns in Pandas?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.


2 Answers

As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:

In [11]: s.loc[('b', slice(2, 10))] Out[11]: b  2   -0.65394    4    0.08227 dtype: float64 

Indeed, you can pass a slice for each level:

In [12]: s.loc[(slice('a', 'b'), slice(2, 10))] Out[12]: a  5    0.27919 b  2   -0.65394    4    0.08227 dtype: float64 

Note: the slice is inclusive.


Old answer:

You can also do this using:

s.ix[1:10, "b"] 

(It's good practice to do in a single ix/loc/iloc since this version allows assignment.)

This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location - which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: "I'm slicing on position".

s["b"].iloc[1:10] 

That said, I kinda disagree with the docs that ix is:

most robust and consistent way

it's not, the most consistent way is to describe what you're doing:

  • use loc for labels
  • use iloc for position
  • use ix for both (if you really have to)

Remember the zen of python:

explicit is better than implicit

like image 170
Andy Hayden Avatar answered Sep 21 '22 05:09

Andy Hayden


As of pandas 0.14.0 it is possible to slice multi-indexed objects by providing .loc a tuple containing slice objects:

In [2]: s.loc[('b', slice(2, 10))] Out[2]: b  2   -1.206052    4   -0.735682 dtype: float64 
like image 20
Robbie Clarken Avatar answered Sep 21 '22 05:09

Robbie Clarken