Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby and count the number of unique values (Pandas)

I have a dataframe with 2 variables: ID and outcome. I'm trying to groupby ID first, and count the number of unique values of outcome within that ID.

df
ID    outcome
1      yes
1      yes
1      yes
2      no
2      yes
2      no

Expected output:

ID    yes    no
1      3     0
2      1     2

My code df[['PID', 'outcome']].groupby('PID')['outcome'].nunique() gives the number of the unique value itself, such that:

ID
1   2
2   2

But I need the counts of the yes and no, how can I achieve that? Thanks!

like image 793
Lumos Avatar asked Aug 03 '17 21:08

Lumos


2 Answers

How about pd.crosstab?

In [1217]: pd.crosstab(df.ID, df.outcome)
Out[1217]: 
outcome  no  yes
ID              
1         0    3
2         2    1
like image 174
cs95 Avatar answered Nov 04 '22 00:11

cs95


Option 2
pd.factorize + np.bincount
This is convoluted and painful... but very fast.

fi, ui = pd.factorize(df.ID.values)
fo, uo = pd.factorize(df.outcome.values)

n, m = ui.size, uo.size
pd.DataFrame(
    np.bincount(fi * m + fo, minlength=n * m).reshape(n, m),
    pd.Index(ui, name='ID'), pd.Index(uo, name='outcome')
)

outcome  yes  no
ID              
1          3   0
2          1   2

Option C

pd.get_dummies(d.ID).T.dot(pd.get_dummies(d.outcome))

   no  yes
1   0    3
2   2    1

Option IV.

df.groupby(['ID', 'outcome']).size().unstack(fill_value=0)
like image 42
piRSquared Avatar answered Nov 04 '22 00:11

piRSquared