Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to expand the data based on the single column in python(transpose)?

Tags:

python

pandas

I have a dataset like below and i need all the different weights for each category in single row and the count

Sample_data

  category  weights
1  aa        3.2
2  aa        2.2
3  aa        4.2
4  bb        3.5
5  bb        4.5
6  aa        0.5
7  cc        0.6
8  bb        7.5
9  cc        6.6
10 dd        2.2
11 aa        3.3
12 bb        4.4
13 cc        5.5
14 dd        6.6

And what i need is the count of each unique category and the different weights of each category in the same row.

Expected output:

 category count  weight1  weight2  weight3  weight4  weight5   
1 aa      5      3.2      2.2      4.2      0.5      3.3
2 bb      4      3.5      4.5      7.5      4.4
3 cc      3      0.6      6.6      5.5
4 dd      2      2.2      6.6

I thought

sampledata['category'].groupby(level = 0)   

will work but it is not. Can some one help me how to do this in python.

like image 571
Surendra babu Pasumarhti Avatar asked Nov 19 '25 06:11

Surendra babu Pasumarhti


1 Answers

I could probably shorten this but the following works:

In [51]:

cat = df.groupby('category')['weights'].agg({'count':'count', 'weight_cat':lambda x: list(x)}).reset_index()
cat
Out[51]:
  category  count                 weight_cat
0       aa      5  [3.2, 2.2, 4.2, 0.5, 3.3]
1       bb      4       [3.5, 4.5, 7.5, 4.4]
2       cc      3            [0.6, 6.6, 5.5]
3       dd      2                 [2.2, 6.6]
In [52]:

cat = cat.join(cat['weight_cat'].apply(lambda x: pd.Series(x)))
cat
Out[52]:
  category  count                 weight_cat    0    1    2    3    4
0       aa      5  [3.2, 2.2, 4.2, 0.5, 3.3]  3.2  2.2  4.2  0.5  3.3
1       bb      4       [3.5, 4.5, 7.5, 4.4]  3.5  4.5  7.5  4.4  NaN
2       cc      3            [0.6, 6.6, 5.5]  0.6  6.6  5.5  NaN  NaN
3       dd      2                 [2.2, 6.6]  2.2  6.6  NaN  NaN  NaN
In [68]:

rename_cols = [col for col in cat if type(col) == int]
rename_weight_cols = ['weight'+str(col + 1) for col in rename_cols]
d = dict(zip(rename_cols, rename_weight_cols))
cat.rename(columns = d,inplace=True)
cat
Out[68]:
  category  count                 weight_cat  weight1  weight2  weight3  \
0       aa      5  [3.2, 2.2, 4.2, 0.5, 3.3]      3.2      2.2      4.2   
1       bb      4       [3.5, 4.5, 7.5, 4.4]      3.5      4.5      7.5   
2       cc      3            [0.6, 6.6, 5.5]      0.6      6.6      5.5   
3       dd      2                 [2.2, 6.6]      2.2      6.6      NaN   

   weight4  weight5  
0      0.5      3.3  
1      4.4      NaN  
2      NaN      NaN  
3      NaN      NaN 

So what the above does is first group on the 'category' column and perform an aggregation on the weight column, we create a count column and then we turn all the values for that group into a list and add this.

I then call apply on that list to turn it into a Series, this will auto generate the names of the columns 0..4.

I then create a dict to rename the columns to weight1 through to 5 as desired.

like image 79
EdChum Avatar answered Nov 21 '25 19:11

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!