Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subset multi-indexed df based on multiple level 1 columns

I have a multi=indexed DataFrame, but I want to keep only two columns per level 1, for each of the level 0 variables (i.e. columns 'one' and 'two'). I can subset them separately, but I would like to do it together so I can keep the values side by side

Here is the DataFrame

index = pd.MultiIndex.from_tuples(list(zip(*[['bar1', 'foo1', 'bar1', 'foo2','bar3','foo3'], ['one','two','three','two','one','four']])))
df = pd.DataFrame(np.random.randn(2, 6), columns=index)

Here is the way to subset for one column in level 1

df.iloc[:, df.columns.get_level_values(1)== 'one']
# or 
df.xs('one', level=1, axis=1)

# but adding two columns within either command will not work e.g. 
df.xs(('one','two), level=1, axis=1)

This would be the expected output

         bar1        foo1       foo2         bar3
          one         two        two          one
0   -0.508272   -0.195379   0.865563     2.002205
1   -0.771565    1.360479   1.900931    -1.589277
like image 229
an10b3 Avatar asked Aug 13 '21 15:08

an10b3


People also ask

How does pandas handle multiple index columns?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.

How do you create a multilevel index in pandas?

Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.

How do I change multiple indexes to single index in pandas?

Output: Now, the dataframe has Hierarchical Indexing or multi-indexing. To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


4 Answers

Here's one way using pd.IndexSlice:

idnx = pd.IndexSlice[:, ['one', 'two']]
df.loc[:, idnx]

Output:

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

Another way using a little known argument, axis, of pd.DataFrame.loc:

df.loc(axis=1)[:, ['one', 'two']]

Output:

       bar1      bar3      foo1      foo2
        one       one       two       two
0  0.589999  0.261224 -0.106588 -2.309628
1  0.646201 -0.491110  0.430724  1.027424

NOTE: This argument is not listed in the documented API for pd.DataFrame.loc, but is referenced in the User Guide in MultiIndex / Advanced indexing section in the Using Slicers paragraph about middle way down with an example.

like image 152
Scott Boston Avatar answered Nov 10 '22 07:11

Scott Boston


We can use Index.isin on a specific level to create a Boolean index and select with loc:

df.loc[:, df.columns.isin(['one', 'two'], level=1)]

df:

       bar1      foo1      foo2      bar3
        one       two       two       one
0  0.042062 -0.233098  0.620974  0.330957
1  0.524495 -0.394930  0.572631  0.499279
like image 35
Henry Ecker Avatar answered Nov 10 '22 06:11

Henry Ecker


You can reindex and specify the level.

df.reindex(['one', 'two'], axis=1, level=1)

       bar1      foo1      foo2      bar3
        one       two       two       one
0  0.276056  1.956400 -1.495128  1.582220
1 -0.383178  1.159138 -1.646173  0.821942
like image 45
ALollz Avatar answered Nov 10 '22 07:11

ALollz


Check the old fashion get_level_values

out = df.loc[:,df.columns.get_level_values(1).isin(['one','two'])]
Out[454]: 
       bar1      foo1      foo2      bar3
        one       two       two       one
0 -0.705540 -1.175132 -0.572076 -1.549703
1  0.277905  1.789925  1.104225  0.104453
like image 37
BENY Avatar answered Nov 10 '22 05:11

BENY