Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas, how can I pass a colon ":" to an indexer through a variable

Tags:

python

pandas

I am working through a method that ultimately will be working with data slices from a large multi-index pandas dataframe. I can generate masks to use for each indexer (essentially lists of values to define the slice):

df.loc[idx[a_mask,b_mask],idx[c_mask,d_mask]]

This would be fine but in some scenarios I'd really like to select everything along some of those axes, something equivalent to:

df.loc[idx[a_mask,b_mask],idx[:,d_mask]]

Is there a way for me to pass that colon ":" that replaces the c_mask in the second example in as a variable? Ideally I'd just set the c_mask to a value like ":", but of course that doesn't work (and shouldn't because what if we had a column named that...). But is there any way to pass in a value by variable that communicates "whole axis" along one of those indexers?

I do realize I could generate a mask that would select everything by gathering together all the values along the appropriate axis, but this is nontrivial and adds a lot of code. Likewise I could break the dataframe access into 5 scenarios (one each for having a : in it and one with four masks) but that doesn't seem to honor the DRY principle and is still brittle because it can't handle multiple direction whole slice selection.

So, anything I can pass in through a variable that will select an entire direction in an indexer like a : would? Or is there a more elegant way to optionally select an entire direction?

like image 805
Ezekiel Kruglick Avatar asked Feb 09 '23 23:02

Ezekiel Kruglick


1 Answers

idx[slice(None)] is equivalent to the idx[:]

So these are all equivalent.

In [11]: df = DataFrame({'A' : np.random.randn(9)},index=pd.MultiIndex.from_product([range(3),list('abc')],names=['first','second']))

In [12]: df
Out[12]: 
                     A
first second          
0     a      -0.668344
      b      -1.679159
      c       0.061876
1     a      -0.237272
      b       0.136495
      c      -1.296027
2     a       0.554533
      b       0.433941
      c      -0.014107

In [13]: idx = pd.IndexSlice

In [14]: df.loc[idx[:,'b'],]
Out[14]: 
                     A
first second          
0     b      -1.679159
1     b       0.136495
2     b       0.433941

In [15]: df.loc[idx[slice(None),'b'],]
Out[15]: 
                     A
first second          
0     b      -1.679159
1     b       0.136495
2     b       0.433941

In [16]: df.loc[(slice(None),'b'),]
Out[16]: 
                     A
first second          
0     b      -1.679159
1     b       0.136495
2     b       0.433941
like image 146
Jeff Avatar answered Feb 12 '23 13:02

Jeff