I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.
E.g I have got the following input
1: Andi
2: Andi, Cindy
3: Thomas, Cindy
4: Cindy, Thomas
And I would like to have the following output:
Hence, the combination of Andi and Thomas does not appear in the data, but Cindy and Thomas appear twice.
Andi Thomas Cindy
Andi 1 0 1
Thomas 0 1 2
Cindy 1 2 1
Has somebody any idea how I could handle this? That would be really great!
Many thanks and regards,
Andi
The crosstab() function is used to compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed. Values to group by in the rows. Values to group by in the columns.
With a basic crosstab, you would have to go back to the program and create a separate crosstab with the information on individual products. Pivot tables let the user filter through their data, add or remove custom fields, and change the appearance of their report.
You can generate the dummy columns first:
df['A'].str.get_dummies(', ')
Out:
Andi Cindy Thomas
0 1 0 0
1 1 1 0
2 0 1 1
3 0 1 1
And use that in the dot product:
tab = df['A'].str.get_dummies(', ')
tab.T.dot(tab)
Out:
Andi Cindy Thomas
Andi 2 1 0
Cindy 1 3 2
Thomas 0 2 2
Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal
from numpy.
co_occurrence = tab.T.dot(tab)
np.fill_diagonal(co_occurrence.values, 1)
co_occurrence
Out:
Andi Cindy Thomas
Andi 1 1 0
Cindy 1 1 2
Thomas 0 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With