Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get one hot encoded vector as in the table below

Tags:

python

pandas

I am trying to get my table in the following form. For some reason, i could not get my pivot code working.

df = pd.DataFrame([('a','f1'), ('a','f2'),('a','f3') ,('b','f4'),('c','f2'), ('c','f4')], columns = ['user', 'val'])


df 
---
user    val
a      f1
a      f2
a      f3
b      f4
c      f2
c      f4 


>> output 

user    f1  f2  f3  f4
a       1   1   1   0
b       0   0   0   1
c       1   0   1   0
like image 965
learner Avatar asked Dec 23 '22 10:12

learner


2 Answers

Option 1
get_dummies with groupby + sum

df.set_index('user').val.str.get_dummies().sum(level=0)

      f1  f2  f3  f4
user                
a      1   1   1   0
b      0   0   0   1
c      0   1   0   1

Option 2
groupby + value_counts + unstack

df.groupby('user').val.value_counts().unstack(fill_value=0)

val   f1  f2  f3  f4
user                
a      1   1   1   0
b      0   0   0   1
c      0   1   0   1

Option 3
pivot_table with size as the aggfunc.

df.pivot_table(index='user', columns='val', aggfunc='size', fill_value=0)

val   f1  f2  f3  f4
user                
a      1   1   1   0
b      0   0   0   1
c      0   1   0   1
like image 109
cs95 Avatar answered Dec 26 '22 00:12

cs95


Seems like pd.crosstab(df['user'], df['val']) work too.

like image 35
learner Avatar answered Dec 25 '22 22:12

learner