Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing one level of a multi-index in Pandas

I have a dataframe that seems like a simple use case for a multi index: I have ISO week numbers and dates as an index, and I'd like to filter by a specific week. Following the instructions in the docs , it looks like I ought to be able to index just by passing a string of the week number. However, this passes me a Key Error.

MCVE:

data = {'foo': {('2016_32', '2016-08-07'): 0.14285714285714285,
  ('2016_32', '2016-08-08'): 0.14285714285714285,
  ('2016_32', '2016-08-09'): 0.14285714285714285,
  ('2016_32', '2016-08-10'): 0.14285714285714285,
  ('2016_32', '2016-08-11'): 0.14285714285714285,
  ('2016_32', '2016-08-12'): 0.14285714285714285,
  ('2016_32', '2016-08-13'): 0.14285714285714285,
  ('2016_36', '2016-09-04'): 0.14285714285714285,
  ('2016_36', '2016-09-05'): 0.14285714285714285,
  ('2016_36', '2016-09-06'): 0.14285714285714285,
  ('2016_36', '2016-09-07'): 0.14285714285714285,
  ('2016_36', '2016-09-08'): 0.14285714285714285,
  ('2016_36', '2016-09-09'): 0.14285714285714285},
 'bar': {('2016_32', '2016-08-07'): np.nan,
  ('2016_32', '2016-08-08'): np.nan,
  ('2016_32', '2016-08-09'): np.nan,
  ('2016_32', '2016-08-10'): np.nan,
  ('2016_32', '2016-08-11'): np.nan,
  ('2016_32', '2016-08-12'): np.nan,
  ('2016_32', '2016-08-13'): np.nan,
  ('2016_36', '2016-09-04'): 0.0,
  ('2016_36', '2016-09-05'): 0.0,
  ('2016_36', '2016-09-06'): 0.0,
  ('2016_36', '2016-09-07'): 0.0,
  ('2016_36', '2016-09-08'): 0.0,
  ('2016_36', '2016-09-09'): 0.0}}

df = pd.DataFrame(data)
df['2016_32']

KeyError: '2016_32'
like image 215
Josh Friedlander Avatar asked Jan 27 '23 23:01

Josh Friedlander


2 Answers

Generally for select Multiindex use DataFrame.xs:

#default first level should be omit
print (df.xs('2016_32'))
#select by second level
#print (df.xs('2016-09-07', level=1))
                 foo  bar
2016-08-07  0.142857  NaN
2016-08-08  0.142857  NaN
2016-08-09  0.142857  NaN
2016-08-10  0.142857  NaN
2016-08-11  0.142857  NaN
2016-08-12  0.142857  NaN
2016-08-13  0.142857  NaN

Or loc:

#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])

Difference in select in MultiIndex in columns and in rows:

np.random.seed(235)
a = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
a1 = pd.MultiIndex.from_product([['A', 'B', 'C'], ['E','F']])
df = pd.DataFrame(np.random.randint(10, size=(6, 8)), index=a1, columns=a)
print (df)
    bar     baz     foo     qux    
    one two one two one two one two
A E   8   1   5   8   3   5   3   3
  F   3   1   3   6   6   1   0   2
B E   0   3   1   7   0   0   8   2
  F   6   7   7   4   2   7   7   5
C E   7   3   1   7   3   9   7   3
  F   8   2   0   8   5   2   2   0

#select by column bar level  
print (df['bar'])
     one  two
A E    8    1
  F    3    1
B E    0    3
  F    6    7
C E    7    3
  F    8    2

#select by column bar and then by `one`
print (df['bar']['one'])
A  E    8
   F    3
B  E    0
   F    6
C  E    7
   F    8
Name: one, dtype: int32

#select by tuples for columns select
print (df[('bar', 'one')])
A  E    8
   F    3
B  E    0
   F    6
C  E    7
   F    8
Name: (bar, one), dtype: int32

For select by rows (MultiIndex in index) use loc:

print (df.loc['A'])
  bar     baz     foo     qux    
  one two one two one two one two
E   8   1   5   8   3   5   3   3
F   3   1   3   6   6   1   0   2

print (df.loc['A'].loc['F'])
bar  one    3
     two    1
baz  one    3
     two    6
foo  one    6
     two    1
qux  one    0
     two    2
Name: F, dtype: int32

print (df.loc[('A', 'F')])
bar  one    3
     two    1
baz  one    3
     two    6
foo  one    6
     two    1
qux  one    0
     two    2
Name: (A, F), dtype: int32
like image 190
jezrael Avatar answered Feb 22 '23 20:02

jezrael


Alternatively you can use Swapping levels with swaplevel without changing the order :

>>> df[:7].swaplevel(0, 0, axis=0)
                         foo  bar
2016_32 2016-08-07  0.142857  NaN
        2016-08-08  0.142857  NaN
        2016-08-09  0.142857  NaN
        2016-08-10  0.142857  NaN
        2016-08-11  0.142857  NaN
        2016-08-12  0.142857  NaN
        2016-08-13  0.142857  NaN

Or Simply :

>>> df[1:7]
                         foo  bar
2016_32 2016-08-08  0.142857  NaN
        2016-08-09  0.142857  NaN
        2016-08-10  0.142857  NaN
        2016-08-11  0.142857  NaN
        2016-08-12  0.142857  NaN
        2016-08-13  0.142857  NaN
like image 34
Karn Kumar Avatar answered Feb 22 '23 22:02

Karn Kumar