Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you update the levels of a pandas MultiIndex after slicing its DataFrame?

Tags:

I have a Dataframe with a pandas MultiIndex:

In [1]: import pandas as pd In [2]: multi_index = pd.MultiIndex.from_product([['CAN','USA'],['total']],names=['country','sex']) In [3]: df = pd.DataFrame({'pop':[35,318]},index=multi_index) In [4]: df Out[4]:                pop country sex CAN     total   35 USA     total  318 

Then I remove some rows from that DataFrame:

In [5]: df = df.query('pop > 100')  In [6]: df Out[6]:                pop country sex USA     total  318 

But when I consult the MutliIndex, it still has both countries in its levels.

In [7]: df.index.levels[0] Out[7]: Index([u'CAN', u'USA'], dtype='object') 

I can fix this myself in a rather strange way:

In [8]: idx_names = df.index.names  In [9]: df = df.reset_index(drop=False)  In [10]: df = df.set_index(idx_names)  In [11]: df Out[11]:                pop country sex USA     total  318  In [12]: df.index.levels[0] Out[12]: Index([u'USA'], dtype='object') 

But this seems rather messy. Is there a better way I'm missing?

like image 525
Kyle Heuton Avatar asked Feb 27 '15 19:02

Kyle Heuton


People also ask

How do I change the index of an existing data frame?

DataFrame - set_index() function The set_index() function is used to set the DataFrame index using existing columns. Set the DataFrame index (row labels) using one or more existing columns or arrays of the correct length. The index can replace the existing index or expand on it.

How can you change the index of a panda series?

reset_index() function to reset the index of the given series object and also we will be dropping the original index labels. Output : As we can see in the output, the Series. reset_index() function has reset the index of the given Series object to default.


1 Answers

From version pandas 0.20.0+ use MultiIndex.remove_unused_levels:

print (df.index) MultiIndex(levels=[['CAN', 'USA'], ['total']],            labels=[[1], [0]],            names=['country', 'sex'])  df.index = df.index.remove_unused_levels()  print (df.index) MultiIndex(levels=[['USA'], ['total']],            labels=[[0], [0]],            names=['country', 'sex']) 
like image 179
jezrael Avatar answered Oct 12 '22 10:10

jezrael