Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas selection by MultiIndex

Question about selecting rows by MultiIndex.

df:

              colA   ColB

   A1   B1   1  ca1  cb1
             2  ca2  cb2
        B2   1  ca3  cb3
             2  ca4  cb4 
   A2   B1   1  ca5  cb5
             2  ca6  cb6
        B2   1  ca7  cb7
             2  ca8  cb8

I want to select colA from A2/B1 where MultiIndex levels(2) > 1. I'm doing it in multiple steps:

df1=df.loc[A2,[B7],:,]  

for some reason I have to present level(1) in brackets

df1.index = x1.index.droplevel(0)

df1.index = x1.index.droplevel(0)

df1.loc[1:,'colA']

I know it is way too much but I cant figure how to do it simpler. Ways I can think of don't want to work.

like image 797
Ranny Avatar asked Dec 24 '22 11:12

Ranny


2 Answers

Since you're working with unnamed index columns, one way is using query on ilevel_* -

df.query("ilevel_0 == 'A2' and ilevel_1 == 'B1' and ilevel_2 > 1")

        colA ColB
A2 B1 2  ca6  cb6

Another method is using index.get_level_values -

f = df.index.get_level_values
df.loc[(f(0) == 'A2') & (f(1) == 'B1') & (f(2) > 1)]

        colA ColB
A2 B1 2  ca6  cb6
like image 156
cs95 Avatar answered Dec 28 '22 07:12

cs95


You can also do it in two operations. First use loc to select parts in A2 and B1, and then do another query at the remaining index, by setting query('index > 1').

df = pd.DataFrame({'ColB': {('A1', 'B1', 1): 'cb1',
  ('A1', 'B1', 2): 'cb2',
  ('A1', 'B2', 1): 'cb3',
  ('A1', 'B2', 2): 'cb4',
  ('A2', 'B1', 1): 'cb5',
  ('A2', 'B1', 2): 'cb6',
  ('A2', 'B2', 1): 'cb7',
  ('A2', 'B2', 2): 'cb8'},
 'colA': {('A1', 'B1', 1): 'ca1',
  ('A1', 'B1', 2): 'ca2',
  ('A1', 'B2', 1): 'ca3',
  ('A1', 'B2', 2): 'ca4',
  ('A2', 'B1', 1): 'ca5',
  ('A2', 'B1', 2): 'ca6',
  ('A2', 'B2', 1): 'ca7',
  ('A2', 'B2', 2): 'ca8'}})

# Answer here
df.loc["A2", "B1"].query("index > 1")
Output:
    ColB    colA
 2  cb6     ca6
like image 42
Tai Avatar answered Dec 28 '22 05:12

Tai