Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas reindexing multiindex not working properly

Tags:

python

pandas

I have a pandas (version 1.0.5) DataFrame with a MultiIndex of two levels, f.i. like:

mi = pd.MultiIndex.from_product((('a', 'c'), (5, 12)))
np.random.seed(123)
df = pd.DataFrame(data=np.random.rand(4, 2), index=mi, columns=['x', 'y'])

I want to reindex the first level of the MultiIndex to contain the keys ['a', 'b', 'c', 'd']. Missing values should be filled with np.nan.

For a non-multiindex dataframe, I'd simply reindex with df.reindex(index=['a', 'b', 'c', 'd']).
Now with the MultiIndex, I assumed that this should work (I also tried all other combinations of the arguments labels, axis and index):

df.reindex(index=['a', 'b', 'c', 'd'], level=0)

But instead it seems to completely ignore the reindex method and returns the unaltered dataframe:

             x         y
a 5   0.696469  0.286139
  12  0.226851  0.551315
c 5   0.719469  0.423106
  12  0.980764  0.684830

The only way I can reindex the MultiIndex, is by fully generating a new MultiIndex:

df.reindex(index=pd.MultiIndex.from_product((
    ['a', 'b', 'c', 'd'], df.index.get_level_values(1).unique())))

Imho there must be an easier way to do it, otherwise I don't see any use in the argument level of the reindex method. Furthermore I quite often have several index levels, which makes reindexing extremely cumbersome.

Did I miss anything? Any idea how to reindex directly without having to explicitly generate a new multiindex?

like image 507
JE_Muc Avatar asked Nov 07 '22 06:11

JE_Muc


1 Answers

This behaviour is not expected. Passing the level argument to reindex on a MultiIndex appears to be broken still in pandas version 1.2.3. There is an issue on github covering this:

https://github.com/pandas-dev/pandas/issues/25460

like image 115
gofvonx Avatar answered Nov 14 '22 21:11

gofvonx