I have a test dataframe:
df1 = pd.DataFrame({
"Group1": ["X", "Y", "Y", "X", "Y", "Z", "X", "Y"],
"Group2": ["A", "C", "A", "B", "C", "C", "B", "A"],
"Number1": [1, 3, 5, 1, 5, 2, 5, 3],
"Number2": [6, 2, 6, 2, 7, 2, 6, 8],
})
df2 = df1.pivot_table(index="Group1", columns="Group2", margins=True)
print(df2)
Output:
Number1 Number2
Group2 A B C All A B C All
Group1
X 1.0 3.0 NaN 2.333333 6.000000 4.0 NaN 4.666667
Y 4.0 NaN 4.000000 4.000000 7.000000 NaN 4.500000 5.750000
Z NaN NaN 2.000000 2.000000 NaN NaN 2.000000 2.000000
All 3.0 3.0 3.333333 3.125000 6.666667 4.0 3.666667 4.875000
When I call stack
on this dataframe, I get this result:
df3 = df2.stack()
print(df3)
Output:
Number1 Number2
Group1 Group2
X A 1.000000 6.000000
All 2.333333 4.666667
B 3.000000 4.000000
Y A 4.000000 7.000000
All 4.000000 5.750000
C 4.000000 4.500000
Z All 2.000000 2.000000
C 2.000000 2.000000
All A 3.000000 6.666667
All 3.125000 4.875000
B 3.000000 4.000000
C 3.333333 3.666667
How can I prevent stack
from sorting the indices so that the order of Group2
remains as A, B, C, All
?
Pandas provides various built-in methods for reshaping DataFrame. Among them, stack() and unstack() are the 2 most popular methods for restructuring columns and rows (also known as index). stack() : stack the prescribed level(s) from column to row. unstack() : unstack the prescribed level(s) from row to column.
When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one. It will automatically detect whether the column names are the same and will stack accordingly. axis=1 will stack the columns in the second DataFrame to the RIGHT of the first DataFrame.
Pandas DataFrame: stack() function The stack() function is used to stack the prescribed level(s) from columns to index. Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame.
IIUC, We need pd.Index.get_level_values
and DataFrame.reindex
df2.stack().reindex(df2.columns.get_level_values(1).unique(), level='Group2')
Number1 Number2
Group1 Group2
X A 1.000000 6.000000
B 3.000000 4.000000
All 2.333333 4.666667
Y A 4.000000 7.000000
C 4.000000 4.500000
All 4.000000 5.750000
Z C 2.000000 2.000000
All 2.000000 2.000000
All A 3.000000 6.666667
B 3.000000 4.000000
C 3.333333 3.666667
All 3.125000 4.875000
We can use level='Group2'
or level=1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With