Pandas Crosstabulation and counting

Tags:

python

pandas

I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.

E.g I have got the following input

1: Andi
2: Andi, Cindy
3: Thomas, Cindy
4: Cindy, Thomas

And I would like to have the following output:

Hence, the combination of Andi and Thomas does not appear in the data, but Cindy and Thomas appear twice.

          Andi  Thomas  Cindy
    Andi    1     0      1
    Thomas  0     1      2
    Cindy   1     2      1

Has somebody any idea how I could handle this? That would be really great!

Many thanks and regards,

Andi

755

asked Jul 10 '17 16:07

Andi Maier

1 Answers

You can generate the dummy columns first:

df['A'].str.get_dummies(', ')
Out: 
   Andi  Cindy  Thomas
0     1      0       0
1     1      1       0
2     0      1       1
3     0      1       1

And use that in the dot product:

tab = df['A'].str.get_dummies(', ')

tab.T.dot(tab)
Out: 
        Andi  Cindy  Thomas
Andi       2      1       0
Cindy      1      3       2
Thomas     0      2       2

Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal from numpy.

co_occurrence = tab.T.dot(tab)    
np.fill_diagonal(co_occurrence.values, 1)    
co_occurrence
Out: 
        Andi  Cindy  Thomas
Andi       1      1       0
Cindy      1      1       2
Thomas     0      2       1

198

answered Sep 30 '22 04:09

ayhan

Related questions
                            
                                pip freeze doesn't show anything in Windows installation?
                            
                                How to connect points in python ax.scatter 3D plot
                            
                                Shows "Unable to get repr for <class 'django.db.models.query.QuerySet'>" while retrieving data
                            
                                Split string into array in Python
                            
                                How to draw a rectangle and adjust its shape by drag and drop in PyQt5
                            
                                how to build .exe for python 3.5+, 3.6 if possible?
                            
                                Python - easy way to "comparison" map one array to another
                            
                                neural network with multiple outputs in sklearn
                            
                                Can you install a Python package via R - Reticulate
                            
                                Python Wand and ImageMagick on AWS Lambda
                            
                                AttributeError: 'module' object has no attribute 'audio_fadein'
                            
                                Total count of objects in Django Model
                            
                                Keeping track of original indicies when sorting a list of lists by length
                            
                                How to ignore some unittest test in Pycharm 2017.1?
                            
                                Add @timestamp field in ElasticSearch with Python
                            
                                Pandas: Resample dataframe column, get discrete feature that corresponds to max value
                            
                                scipy -- how to integrate a linearly interpolated function?
                            
                                Run two different versions of chrome using selenium (Python)
                            
                                Get list of MySQL databases with python
                            
                                How does data shape change during Conv2D and Dense in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With