I have multiindex dataframe that looks like this:
value
year name
1921 Ah 40
1921 Ai 90
1922 Ah 100
1922 Ai 7
in which year
and name
are the indices. I want to select every row where the name Ai
appears. I have tried df.loc[(:,'Ai')]
and df.loc['Ai']
but both give errors. How do I index only using the name column?
To drop a level from a multi-level column index, use the columns. droplevel(). We have used the Multiindex. from_tuples() is used to create indexes column-wise.
The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.
@sacul has the most idiomatic answer, but here are a few alternatives.
MultiIndex.get_level_values
df[df.index.get_level_values('name') == 'Ai']
value
year name
1921 Ai 90
1922 Ai 7
DataFrame.query
df.query('name == "Ai"')
value
year name
1921 Ai 90
1922 Ai 7
DataFrame.loc(axis=0)
with pd.IndexSlice
Similar to @liliscent's answer, but does not need the trailing :
if you specify axis=0
.
df.loc(axis=0)[pd.IndexSlice[:, 'Ai']]
value
year name
1921 Ai 90
1922 Ai 7
I would use .xs
on the first level of your multiindex (note: level=1
refers to the "second" index (name
) because of python's zero indexing: level 0 is year
in your case):
df.xs('Ai', level=1, drop_level=False)
# or
df.xs('Ai', level='name', drop_level=False)
value
year name
1921 Ai 90
1922 Ai 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With