Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select data at a particular level from a MultiIndex

Tags:

python

pandas

I have the following Pandas Dataframe with a MultiIndex(Z,A):

             H1       H2  
   Z    A 
0  100  200  0.3112   -0.4197   
1  100  201  0.2967   0.4893    
2  100  202  0.3084   -0.4873   
3  100  203  0.3069   NaN        
4  101  203  -0.4956  NaN       

Question: How can I select all items with A=203? I tried df[:,'A'] but it doesn't work. Then I found this in the online documentation so I tried:
df.xs(203,level='A')
but I get:
"TypeError: xs() got an unexpected keyword argument 'level'"
Also I dont see this parameter in the installed doc(df.xs?):
"Parameters ---------- key : object Some label contained in the index, or partially in a MultiIndex axis : int, default 0 Axis to retrieve cross-section on copy : boolean, default True Whether to make a copy of the data"
Note:I have the development version.

Edit: I found this thread. They recommend something like:

df.select(lambda x: x[1]==200, axis=0)  

I still would like to know what happened with df.xs with the level parameter or what is the recommended way in the current version.

like image 703
elyase Avatar asked Apr 16 '12 13:04

elyase


People also ask

How do I drop one level of MultiIndex pandas?

Drop Level Using MultiIndex.droplevel() to drop columns level. When you have Multi-level columns DataFrame. columns return MultiIndex object and use droplevel() on this object to drop level.

How do you select a specific index in a DataFrame?

iloc selects rows based on an integer index. So, if you want to select the 5th row in a DataFrame, you would use df. iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on.


2 Answers

The problem lies in my assumption(incorrect) that I was in the dev version while in reality I had 1.6.1, one can check the current installed version with:

import pandas
print pandas.__version__

in the current version df.xs() with the level parameter works ok.

like image 158
elyase Avatar answered Sep 18 '22 09:09

elyase


Not a direct answer to the question, but if you want to select more than one value you can use the "slice()" notation:

import numpy
from pandas import  MultiIndex, Series

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
              ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = Series(numpy.random.randn(8), index=index)

In [10]: s
Out[10]:
first  second
bar    one       0.181621
       two       1.016225
baz    one       0.716589
       two      -0.353731
foo    one      -0.326301
       two       1.009143
qux    one       0.098225
       two      -1.087523
dtype: float64

In [11]: s.loc[slice(None)]
Out[11]:
first  second
bar    one       0.181621
       two       1.016225
baz    one       0.716589
       two      -0.353731
foo    one      -0.326301
       two       1.009143
qux    one       0.098225
       two      -1.087523
dtype: float64

In [12]: s.loc[slice(None), "one"]
Out[12]:
first
bar      0.181621
baz      0.716589
foo     -0.326301
qux      0.098225
dtype: float64

In [13]: s.loc["bar", slice(None)]
Out[13]:
first  second
bar    one       0.181621
       two       1.016225
dtype: float64
like image 28
rogueleaderr Avatar answered Sep 18 '22 09:09

rogueleaderr