Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Pandas groupby group into columns

I'm trying to group a Pandas dataframe by two separate group types, A_Bucket and B_Bucket, and convert each A_Bucket group into a column. I get the groups as such:

grouped = my_new_df.groupby(['A_Bucket','B_Bucket'])

I want the A_Bucket group to be in columns and the B_Bucket group to be the indices. 'A' has about 20 values and B has about 20 values, so there are a total of about 400 groups.

When I print grouped and its type I get:

type of grouped2 = <class 'pandas.core.groupby.DataFrameGroupBy'>
A_Bucket  B_Bucket
0.100      100.0            5.418450
           120.0           18.061367
0.125      80.0             3.100920
           100.0           14.137063
           120.0           30.744823
           140.0           38.669950
           160.0           48.303129
           180.0           74.576333
           200.0          125.119950
0.150      60.0             0.003200
           80.0             2.274807
           100.0            5.350074
           120.0           23.272970
           140.0           40.131780
           160.0           47.036912
           180.0           72.438978
           200.0          117.365480

So A_Bucket group 0.100 has only 2 values, but 0.125 has 7. I want a dataframe like this:

        0.1     0.125      0.15
80      NaN     3.10092    2.274807
100     5.41845 14.137063  5.350074
120     18.0613 30.744823  23.27297
140     NaN     38.66995   40.13178
160     NaN     48.303129  47.036912
180     NaN     74.576333  72.438978
200     NaN     125.11995  NaN

I saw this question: Pandas groupby result into multiple columns

but I don't understand the syntax, and it doesn't arrange the first group into columns like I need. I also want this to work for more than one output column.

How do I do this?

like image 632
HonestMath Avatar asked Sep 05 '16 01:09

HonestMath


1 Answers

If I understand you correctly, you are trying to reshape your data frame instead of grouping by summary, in this case you can use set_index() and unstack():

df.set_index(["A_Bucket", "B_Bucket"]).unstack(level=0)

#              Value
# A_Bucket     0.100     0.125       0.150
# B_Bucket          
#     60.0       NaN       NaN    0.003200
#     80.0       NaN  3.100920    2.274807
#    100.0  5.418450 14.137063    5.350074
#    120.0 18.061367 30.744823   23.272970
#    140.0       NaN 38.669950   40.131780
#    160.0       NaN 48.303129   47.036912
#    180.0       NaN 74.576333   72.438978
#    200.0       NaN 125.119950 117.365480

If you indeed have done the summary after grouping by, you can still do df.groupby(['A_Bucket', 'B_Bucket']).mean().unstack(level=0)

like image 140
Psidom Avatar answered Sep 20 '22 07:09

Psidom