Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas pivot_table preserve order

>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

I want the output to be as shown above, but I get a sorted output. bar comes above foo and so on.

like image 244
Rahul Ranjan Avatar asked Jul 08 '17 16:07

Rahul Ranjan


3 Answers

I think pivot_table doesn't have an option for sorting, but groupby has:

df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().unstack('C')
Out: 
C        small  large
A   B                
foo one    1.0    4.0
    two    6.0    NaN
bar one    5.0    4.0
    two    6.0    7.0

You pass the grouping columns to groupby and for the ones you want to show as column values, you use unstack.

If you don't want the index names, rename them as None:

df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().rename_axis([None, None, None]).unstack(level=2)
Out: 
         small  large
foo one    1.0    4.0
    two    6.0    NaN
bar one    5.0    4.0
    two    6.0    7.0
like image 172
ayhan Avatar answered Nov 05 '22 09:11

ayhan


Since pandas 1.3.0, it's possible to specify sort=False in pd.pivot_table:

>>> import pandas as pd
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'],
...                aggfunc='sum', sort=False)
C        large  small
A   B                
foo one    4.0    1.0
    two    NaN    6.0
bar one    4.0    5.0
    two    7.0    6.0
like image 4
Eric Duminil Avatar answered Nov 05 '22 09:11

Eric Duminil


While creating pivot_table, the index is automatically sorted alphabetically. Not only foo and bar, you may also notice small and large is sorted. If you want foo on top, you may need to sort them again using sortlevel. If you are expecting output as in example here, then sorting on A and C both may be needed.

table.sortlevel(["A","B"], ascending= [False,True], sort_remaining=False, inplace=True)
table.sortlevel(["C"], axis=1, ascending=False,  sort_remaining=False, inplace=True)
print(table)

Output:

C        small  large
A   B                
foo one  1.0    4.0  
    two  6.0    NaN   
bar one  5.0    4.0  
    two  6.0    7.0  

Update:

To remove index names A, B and C:

table.columns.name = None
table.index.names = (None, None)
like image 3
student Avatar answered Nov 05 '22 10:11

student