I have a dataframe with 2 variables: ID
and outcome
. I'm trying to groupby
ID
first, and count the number of unique values of outcome
within that ID
.
df
ID outcome
1 yes
1 yes
1 yes
2 no
2 yes
2 no
Expected output:
ID yes no
1 3 0
2 1 2
My code df[['PID', 'outcome']].groupby('PID')['outcome'].nunique()
gives the number of the unique value itself, such that:
ID
1 2
2 2
But I need the counts of the yes
and no
, how can I achieve that? Thanks!
How about pd.crosstab
?
In [1217]: pd.crosstab(df.ID, df.outcome)
Out[1217]:
outcome no yes
ID
1 0 3
2 2 1
Option 2pd.factorize
+ np.bincount
This is convoluted and painful... but very fast.
fi, ui = pd.factorize(df.ID.values)
fo, uo = pd.factorize(df.outcome.values)
n, m = ui.size, uo.size
pd.DataFrame(
np.bincount(fi * m + fo, minlength=n * m).reshape(n, m),
pd.Index(ui, name='ID'), pd.Index(uo, name='outcome')
)
outcome yes no
ID
1 3 0
2 1 2
Option C
pd.get_dummies(d.ID).T.dot(pd.get_dummies(d.outcome))
no yes
1 0 3
2 2 1
Option IV.
df.groupby(['ID', 'outcome']).size().unstack(fill_value=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With