I have a multi=indexed DataFrame, but I want to keep only two columns per level 1, for each of the level 0 variables (i.e. columns 'one' and 'two'). I can subset them separately, but I would like to do it together so I can keep the values side by side
Here is the DataFrame
index = pd.MultiIndex.from_tuples(list(zip(*[['bar1', 'foo1', 'bar1', 'foo2','bar3','foo3'], ['one','two','three','two','one','four']])))
df = pd.DataFrame(np.random.randn(2, 6), columns=index)
Here is the way to subset for one column in level 1
df.iloc[:, df.columns.get_level_values(1)== 'one']
# or
df.xs('one', level=1, axis=1)
# but adding two columns within either command will not work e.g.
df.xs(('one','two), level=1, axis=1)
This would be the expected output
bar1 foo1 foo2 bar3
one two two one
0 -0.508272 -0.195379 0.865563 2.002205
1 -0.771565 1.360479 1.900931 -1.589277
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.
Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.
Output: Now, the dataframe has Hierarchical Indexing or multi-indexing. To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
Here's one way using pd.IndexSlice
:
idnx = pd.IndexSlice[:, ['one', 'two']]
df.loc[:, idnx]
Output:
bar1 bar3 foo1 foo2
one one two two
0 0.589999 0.261224 -0.106588 -2.309628
1 0.646201 -0.491110 0.430724 1.027424
Another way using a little known argument, axis
, of pd.DataFrame.loc
:
df.loc(axis=1)[:, ['one', 'two']]
Output:
bar1 bar3 foo1 foo2
one one two two
0 0.589999 0.261224 -0.106588 -2.309628
1 0.646201 -0.491110 0.430724 1.027424
NOTE: This argument is not listed in the documented API for pd.DataFrame.loc, but is referenced in the User Guide in MultiIndex / Advanced indexing section in the Using Slicers paragraph about middle way down with an example.
We can use Index.isin
on a specific level to create a Boolean index and select with loc
:
df.loc[:, df.columns.isin(['one', 'two'], level=1)]
df
:
bar1 foo1 foo2 bar3
one two two one
0 0.042062 -0.233098 0.620974 0.330957
1 0.524495 -0.394930 0.572631 0.499279
You can reindex
and specify the level
.
df.reindex(['one', 'two'], axis=1, level=1)
bar1 foo1 foo2 bar3
one two two one
0 0.276056 1.956400 -1.495128 1.582220
1 -0.383178 1.159138 -1.646173 0.821942
Check the old fashion get_level_values
out = df.loc[:,df.columns.get_level_values(1).isin(['one','two'])]
Out[454]:
bar1 foo1 foo2 bar3
one two two one
0 -0.705540 -1.175132 -0.572076 -1.549703
1 0.277905 1.789925 1.104225 0.104453
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With