Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivot table where the column contains a string with multiple catogeries

I have a data in the form:

'cat'     'value'
a         1
a,b       2
a,b,c     3
b,c       2
b         1

which I would like to convert using a pivot table:

'a'  'b'  'c'
1
2    2
3    3    3
     2    2
     1

How do I perform this. If I use the pivot command:

df.pivot(columns= 'cat', values = 'value')

which yields this result

'a' 'a,b' 'a,b,c' 'b,c' 'b' 
1
     2
           3
                   2
                        1
like image 767
azuric Avatar asked Jan 24 '23 12:01

azuric


2 Answers

You can use .explode() after transforming the string into a list, and then pivot it normally:

df['cat'] = df['cat'].str.split(',')
df = df.explode('cat').pivot_table(index=df.explode('cat').index,columns='cat',values='value')

This outputs:

cat a   b   c
0   1.0 NaN NaN
1   2.0 2.0 NaN
2   3.0 3.0 3.0
3   NaN 2.0 2.0
4   NaN 1.0 NaN

You can then reset, or rename the index if you wish for it to not be named cat.

like image 78
Celius Stingher Avatar answered Feb 03 '23 08:02

Celius Stingher


Try with str.get_dummies and multiply the value column (then replace 0 with nan if necessary)

df['cat'].str.get_dummies(",").mul(df['value'],axis=0).replace(0,np.nan)

     a    b    c
0  1.0  NaN  NaN
1  2.0  2.0  NaN
2  3.0  3.0  3.0
3  NaN  2.0  2.0
4  NaN  1.0  NaN
like image 38
anky Avatar answered Feb 03 '23 08:02

anky