>>> df
A B C D
0 foo one small 1
1 foo one large 2
2 foo one large 2
3 foo two small 3
4 foo two small 3
5 bar one large 4
6 bar one small 5
7 bar two small 6
8 bar two large 7
>>> table = pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum)
>>> table
small large
foo one 1 4
two 6 NaN
bar one 5 4
two 6 7
I want the output to be as shown above, but I get a sorted output. bar comes above foo and so on.
I think pivot_table doesn't have an option for sorting, but groupby has:
df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().unstack('C')
Out:
C small large
A B
foo one 1.0 4.0
two 6.0 NaN
bar one 5.0 4.0
two 6.0 7.0
You pass the grouping columns to groupby and for the ones you want to show as column values, you use unstack.
If you don't want the index names, rename them as None:
df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().rename_axis([None, None, None]).unstack(level=2)
Out:
small large
foo one 1.0 4.0
two 6.0 NaN
bar one 5.0 4.0
two 6.0 7.0
Since pandas 1.3.0, it's possible to specify sort=False
in pd.pivot_table
:
>>> import pandas as pd
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
... "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
... "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'],
... aggfunc='sum', sort=False)
C large small
A B
foo one 4.0 1.0
two NaN 6.0
bar one 4.0 5.0
two 7.0 6.0
While creating pivot_table
, the index is automatically sorted alphabetically. Not only foo
and bar
, you may also notice small
and large
is sorted. If you want foo
on top, you may need to sort
them again using sortlevel
. If you are expecting output as in example here, then sorting on A
and C
both may be needed.
table.sortlevel(["A","B"], ascending= [False,True], sort_remaining=False, inplace=True)
table.sortlevel(["C"], axis=1, ascending=False, sort_remaining=False, inplace=True)
print(table)
Output:
C small large
A B
foo one 1.0 4.0
two 6.0 NaN
bar one 5.0 4.0
two 6.0 7.0
To remove index names A
, B
and C
:
table.columns.name = None
table.index.names = (None, None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With