Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to crosstab a pandas dataframe when one variable (column) is a list of varying length

How can I generate a crossed table from the following dataframe:

import pandas as pd
dat = pd.read_csv('data.txt', sep=',')
dat.head(6)

  Factor1 Factor2
0       A       X
1       B       X
2       A     X|Y
3       B     X|Y
4       A   X|Y|Z
5       B   X|Y|Z

dat[['Factor2']] = dat[['Factor2']].applymap(lambda x : x.split('|'))
dat.head(6)

  Factor1    Factor2
0       A        [X]
1       B        [X]
2       A     [X, Y]
3       B     [X, Y]
4       A  [X, Y, Z]
5       B  [X, Y, Z]

The resulting pd.crosstab() should look like this:

  X Y Z
A 3 2 1
B 3 2 1
like image 376
striatum Avatar asked Dec 05 '25 06:12

striatum


1 Answers

We can use get_dummies to convert the Feature2 column to indicator variables, then group the indicator variables by Feature1 and aggregate with sum

df['Factor2'].str.get_dummies('|').groupby(df['Factor1']).sum()

         X  Y  Z
Factor1         
A        3  2  1
B        3  2  1
like image 193
Shubham Sharma Avatar answered Dec 06 '25 22:12

Shubham Sharma