Say we have a DataFrame that looks like this:
day_of_week   ice_cream     count   proportion
0   Friday    vanilla       638     0.094473
1   Friday    chocolate     2048    0.663506
2   Friday    strawberry    4088    0.251021
3   Monday    vanilla       448     0.079736
4   Monday    chocolate     2332    0.691437
5   Monday    strawberry    441     0.228828
6   Saturday  vanilla       24      0.073350
7   Saturday  chocolate     244     0.712930  ...   ...
I want a new DataFrame that collapses onto day_of_week as an index so it looks like this:
    day_of_week vanilla    chocolate   strawberry
0   Friday      0.094473   0.663506    0.251021 
1   Monday      0.079736   0.691437    0.228828
2   Saturday    ...        ...         ...
What's the cleanest way I can implement this?
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
DataFrame - pivot() function The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
df.pivot_table is the correct solution:
In[31]: df.pivot_table(values='proportion', index='day_of_week', columns='ice_cream').reset_index()
Out[31]: 
    ice_cream day_of_week  chocolate  strawberry   vanilla
0              Friday   0.663506    0.251021  0.094473
1              Monday   0.691437    0.228828  0.079736
2            Saturday   0.712930         NaN  0.073350
If you leave out reset_index() it will actually return an indexed dataframe, which might be more useful for you.
Note that a pivot table necessarily performs a dimensionality reduction when the values column is not a function of the tuple (index, columns). If there are multiple (index, columns) pairs with different value pivot_table brings the dimensionality down to one by using an aggregation function, by default mean.
You are looking for pivot_table
df = pd.pivot_table(df, index='day_of_week', columns='ice_cream', values = 'proportion')
You get:
ice_cream   chocolate   strawberry  vanilla
day_of_week         
Friday      0.663506    0.251021    0.094473
Monday      0.691437    0.228828    0.079736
Saturday    0.712930    NaN         0.073350
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With