I have a dataframe with this index:
index = pd.MultiIndex.from_product([['stock1','stock2'...],['price','volume'...]])
It's a useful structure for being able to do df['stock1']
, but how do I select all the price data? I can't make any sense of the documentation.
I've tried the following with no luck: df[:,'price']
df[:]['price']
df.loc(axis=1)[:,'close']
df['price]
If this index style is generally agreed to be a bad idea for whatever reason, then what would be a better choice? Should I go for a multi-indexed index for the stocks as labels on the time series instead of at the column level?
Many thanks
EDIT - I am using the multiindex for the columns, not the index (the wording got the better of me). The examples in the documentation focus on multi-level indexes rather than column structures.
Python – Drop multiple levels from a multi-level column index in Pandas dataframe. To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly. We have used the Multiindex.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.
Also using John's data sample:
Using xs()
is another way to slice a MultiIndex
:
df 0 stock1 price 1 volume 2 stock2 price 3 volume 4 stock3 price 5 volume 6 df.xs('price', level=1, drop_level=False) 0 stock1 price 1 stock2 price 3 stock3 price 5
Alternatively if you have a MultiIndex
in place of columns:
df stock1 stock2 stock3 price volume price volume price volume 0 1 2 3 4 5 6 df.xs('price', axis=1, level=1, drop_level=False) stock1 stock2 stock3 price price price 0 1 3 5
Using @JohnZwinck's data sample:
In [132]: df Out[132]: 0 stock1 price 1 volume 2 stock2 price 3 volume 4 stock3 price 5 volume 6
Option 1:
In [133]: df.loc[(slice(None), slice('price')), :] Out[133]: 0 stock1 price 1 stock2 price 3 stock3 price 5
Option 2:
In [134]: df.loc[pd.IndexSlice[:, 'price'], :] Out[134]: 0 stock1 price 1 stock2 price 3 stock3 price 5
UPDATE:
But what if for the 2nd Index, I want to select everything but price and there are multiple values so that enumeration is not an option. Is there something like slice(~'price')
first let's name the index levels:
df = df.rename_axis(["lvl0", "lvl1"])
now we can use the df.query()
method:
In [18]: df.query("lvl1 != 'price'") Out[18]: 0 lvl0 lvl1 stock1 volume 2 stock2 volume 4 stock3 volume 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With