Say we have a DataFrame that looks like this:
day_of_week ice_cream count proportion
0 Friday vanilla 638 0.094473
1 Friday chocolate 2048 0.663506
2 Friday strawberry 4088 0.251021
3 Monday vanilla 448 0.079736
4 Monday chocolate 2332 0.691437
5 Monday strawberry 441 0.228828
6 Saturday vanilla 24 0.073350
7 Saturday chocolate 244 0.712930 ... ...
I want a new DataFrame that collapses onto day_of_week
as an index so it looks like this:
day_of_week vanilla chocolate strawberry
0 Friday 0.094473 0.663506 0.251021
1 Monday 0.079736 0.691437 0.228828
2 Saturday ... ... ...
What's the cleanest way I can implement this?
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
DataFrame - pivot() function The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
df.pivot_table
is the correct solution:
In[31]: df.pivot_table(values='proportion', index='day_of_week', columns='ice_cream').reset_index()
Out[31]:
ice_cream day_of_week chocolate strawberry vanilla
0 Friday 0.663506 0.251021 0.094473
1 Monday 0.691437 0.228828 0.079736
2 Saturday 0.712930 NaN 0.073350
If you leave out reset_index()
it will actually return an indexed dataframe, which might be more useful for you.
Note that a pivot table necessarily performs a dimensionality reduction when the values
column is not a function of the tuple (index, columns)
. If there are multiple (index, columns)
pairs with different value
pivot_table
brings the dimensionality down to one by using an aggregation function, by default mean
.
You are looking for pivot_table
df = pd.pivot_table(df, index='day_of_week', columns='ice_cream', values = 'proportion')
You get:
ice_cream chocolate strawberry vanilla
day_of_week
Friday 0.663506 0.251021 0.094473
Monday 0.691437 0.228828 0.079736
Saturday 0.712930 NaN 0.073350
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With