I have a dataframe that seems like a simple use case for a multi index: I have ISO week numbers and dates as an index, and I'd like to filter by a specific week. Following the instructions in the docs , it looks like I ought to be able to index just by passing a string of the week number. However, this passes me a Key Error.
MCVE:
data = {'foo': {('2016_32', '2016-08-07'): 0.14285714285714285,
('2016_32', '2016-08-08'): 0.14285714285714285,
('2016_32', '2016-08-09'): 0.14285714285714285,
('2016_32', '2016-08-10'): 0.14285714285714285,
('2016_32', '2016-08-11'): 0.14285714285714285,
('2016_32', '2016-08-12'): 0.14285714285714285,
('2016_32', '2016-08-13'): 0.14285714285714285,
('2016_36', '2016-09-04'): 0.14285714285714285,
('2016_36', '2016-09-05'): 0.14285714285714285,
('2016_36', '2016-09-06'): 0.14285714285714285,
('2016_36', '2016-09-07'): 0.14285714285714285,
('2016_36', '2016-09-08'): 0.14285714285714285,
('2016_36', '2016-09-09'): 0.14285714285714285},
'bar': {('2016_32', '2016-08-07'): np.nan,
('2016_32', '2016-08-08'): np.nan,
('2016_32', '2016-08-09'): np.nan,
('2016_32', '2016-08-10'): np.nan,
('2016_32', '2016-08-11'): np.nan,
('2016_32', '2016-08-12'): np.nan,
('2016_32', '2016-08-13'): np.nan,
('2016_36', '2016-09-04'): 0.0,
('2016_36', '2016-09-05'): 0.0,
('2016_36', '2016-09-06'): 0.0,
('2016_36', '2016-09-07'): 0.0,
('2016_36', '2016-09-08'): 0.0,
('2016_36', '2016-09-09'): 0.0}}
df = pd.DataFrame(data)
df['2016_32']
KeyError: '2016_32'
Generally for select Multiindex
use DataFrame.xs
:
#default first level should be omit
print (df.xs('2016_32'))
#select by second level
#print (df.xs('2016-09-07', level=1))
foo bar
2016-08-07 0.142857 NaN
2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
Or loc
:
#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])
Difference in select in MultiIndex in columns and in rows:
np.random.seed(235)
a = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
a1 = pd.MultiIndex.from_product([['A', 'B', 'C'], ['E','F']])
df = pd.DataFrame(np.random.randint(10, size=(6, 8)), index=a1, columns=a)
print (df)
bar baz foo qux
one two one two one two one two
A E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
B E 0 3 1 7 0 0 8 2
F 6 7 7 4 2 7 7 5
C E 7 3 1 7 3 9 7 3
F 8 2 0 8 5 2 2 0
#select by column bar level
print (df['bar'])
one two
A E 8 1
F 3 1
B E 0 3
F 6 7
C E 7 3
F 8 2
#select by column bar and then by `one`
print (df['bar']['one'])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: one, dtype: int32
#select by tuples for columns select
print (df[('bar', 'one')])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: (bar, one), dtype: int32
For select by rows (MultiIndex in index) use loc
:
print (df.loc['A'])
bar baz foo qux
one two one two one two one two
E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
print (df.loc['A'].loc['F'])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: F, dtype: int32
print (df.loc[('A', 'F')])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: (A, F), dtype: int32
Alternatively you can use Swapping levels with swaplevel without changing the order :
>>> df[:7].swaplevel(0, 0, axis=0)
foo bar
2016_32 2016-08-07 0.142857 NaN
2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
Or Simply :
>>> df[1:7]
foo bar
2016_32 2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With