Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to maintain Pandas DataFrame index order when using stack/unstack?

Tags:

python

pandas

Example One: Notice the index order of the given Pandas DataFrame df:

>>> df
              A  B
first second      
zzz   z       2  4
      a       1  5
aaa   z       6  3
      a       7  8

After using the stack and unstack methods on the given df DataFrame object, the index is automatically sorted lexicographically (alphabetically) so that one loses the original order of the rows.

>>> df.unstack().stack()
              A  B
first second      
aaa   a       7  8
      z       6  3
zzz   a       1  5
      z       2  4

Is it possible to maintain the original ordering after the unstack/stack operations above?

According to official documentation reshaping-by-stacking-and-unstacking:

Notice that the stack and unstack methods implicitly sort the index levels involved. Hence a call to stack and then unstack, or viceversa, will result in a sorted copy of the original DataFrame or Series

Example Two:

>>> dfu = df.unstack()
>>> dfu
         A      Z   
second   a  z   a  z
first               
aaa      7  6   8  3
zzz      1  2   5  4

If the original index is perserved we need dfu like so:

>>> dfu
             A      Z   
    second   a  z   a  z
    first               
    zzz      1  2   5  4
    aaa      7  6   8  3

What I'm looking for is a solution that could be used to revert the index order based on the original dataframe after an unstack() or stack() method has been called.

like image 294
AtlasStrategic Avatar asked Nov 09 '15 08:11

AtlasStrategic


People also ask

How do I change the order of indexes in pandas?

Suppose we want to change the order of the index of series, then we have to use the Series. reindex() Method of pandas module for performing this task.

How do I stack unstack pandas?

Pandas provides various built-in methods for reshaping DataFrame. Among them, stack() and unstack() are the 2 most popular methods for restructuring columns and rows (also known as index). stack() : stack the prescribed level(s) from column to row. unstack() : unstack the prescribed level(s) from row to column.

What happens when we call the stack () function on a pandas DataFrame?

Using the stack() function will reshape the dataframe by converting the data into a stacked form. Since we are having multiple indices, that means converting (also called rotating or pivoting) the innermost column index into the innermost row index.

What does unstack () do in Python?

Pivot a level of the (necessarily hierarchical) index labels. Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).


1 Answers

You can keep a copy of the original index and reindex to that, thanks Andy Hayden.

Demo:

#              A  B
#first second      
#zzz   z       2  4
#      a       1  5
#aaa   z       6  3
#      a       7  8

print df.index
#MultiIndex(levels=[[u'aaa', u'zzz'], [u'a', u'z']],
#           labels=[[1, 1, 0, 0], [1, 0, 1, 0]],
#           names=[u'first', u'second'])

#set index to variable
index = df.index

#stack and unstack
df = df.unstack().stack()
print df
#              A  B
#first second      
#aaa   a       7  8
#      z       6  3
#zzz   a       1  5
#      z       2  4
#              A  B

df = df.reindex(index)
print df
#              A  B
#first second      
#zzz   z       2  4
#      a       1  5
#aaa   z       6  3
#      a       7  8
like image 144
jezrael Avatar answered Oct 06 '22 17:10

jezrael