I have a pandas
(version 1.0.5) DataFrame
with a MultiIndex
of two levels, f.i. like:
mi = pd.MultiIndex.from_product((('a', 'c'), (5, 12)))
np.random.seed(123)
df = pd.DataFrame(data=np.random.rand(4, 2), index=mi, columns=['x', 'y'])
I want to reindex
the first level of the MultiIndex to contain the keys ['a', 'b', 'c', 'd']
. Missing values should be filled with np.nan
.
For a non-multiindex dataframe, I'd simply reindex with df.reindex(index=['a', 'b', 'c', 'd'])
.
Now with the MultiIndex
, I assumed that this should work (I also tried all other combinations of the arguments labels
, axis
and index
):
df.reindex(index=['a', 'b', 'c', 'd'], level=0)
But instead it seems to completely ignore the reindex
method and returns the unaltered dataframe:
x y
a 5 0.696469 0.286139
12 0.226851 0.551315
c 5 0.719469 0.423106
12 0.980764 0.684830
The only way I can reindex the MultiIndex, is by fully generating a new MultiIndex
:
df.reindex(index=pd.MultiIndex.from_product((
['a', 'b', 'c', 'd'], df.index.get_level_values(1).unique())))
Imho there must be an easier way to do it, otherwise I don't see any use in the argument level
of the reindex
method. Furthermore I quite often have several index levels, which makes reindexing extremely cumbersome.
Did I miss anything? Any idea how to reindex directly without having to explicitly generate a new multiindex?
This behaviour is not expected. Passing the level
argument to reindex
on a MultiIndex
appears to be broken still in pandas
version 1.2.3. There is an issue on github covering this:
https://github.com/pandas-dev/pandas/issues/25460
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With