Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a crosstab type dataframe with a binary count value in pandas

Tags:

python

pandas

I have a pandas dataframe like this

UIID  ISBN
a      12
b      13

I want to compare each UUID with the ISBN and add a count column in the dataframe.

UUID ISBN Count
 a     12   1
 a     13   0
 b     12   0
 b     13   1

How can this be done in pandas. I know the crosstab function does the same thing but I want the data in this format.

like image 921
Pranay Pant Avatar asked Mar 05 '23 12:03

Pranay Pant


1 Answers

Use crosstab with melt:

df = pd.crosstab(df['UIID'], df['ISBN']).reset_index().melt('UIID', value_name='count')
print (df)
  UIID ISBN  count
0    a   12      1
1    b   12      0
2    a   13      0
3    b   13      1

Alternative solution with GroupBy.size and reindex by MultiIndex.from_product:

s = df.groupby(['UIID','ISBN']).size()
mux = pd.MultiIndex.from_product(s.index.levels, names=s.index.names)
df = s.reindex(mux, fill_value=0).reset_index(name='count')
print (df)
  UIID  ISBN  count
0    a    12      1
1    a    13      0
2    b    12      0
3    b    13      1
like image 178
jezrael Avatar answered Apr 09 '23 05:04

jezrael